The Problem: Static Testing in a Dynamic World

Agentic software development—where AI agents write, review, and ship code at unprecedented speed—has broken the traditional testing model. Manual test authoring, brittle static suites, and the endless cycle of false positives are becoming bottlenecks. Engineers spend more time maintaining tests than fixing real bugs.

Enter Just-in-Time Tests (JiTTests): a paradigm shift where tests are no longer permanent residents of your codebase. Instead, they are generated automatically by large language models (LLMs) the moment a pull request is submitted. This approach, pioneered in Meta's research, reimagines testing for the agentic era.

Key Insight: JiTTests don't just automate test execution—they automate test creation itself, adapting to each unique code change.

LLM generating just-in-time tests for a pull request in a code editor Programming Illustration

How Catching JiTTests Work Under the Hood

Catching JiTTests focus specifically on regression detection. Here's the step-by-step flow:

  1. Code change lands in a pull request.
  2. The system infers the intention of the change using an LLM.
  3. It creates mutants—versions of the code with deliberately injected faults—to simulate what could go wrong.
  4. It generates and runs tests designed to catch those injected faults.
  5. An ensemble of rule-based and LLM-based assessors filters out false positives.
  6. Engineers receive a clear, actionable report only when a real bug is found.
# Simplified pseudocode illustrating the JiTTest pipeline

def generate_jit_tests(pull_request_code, original_code):
    """Generate just-in-time tests for a given code change."""
    # Step 1: Infer the intent of the change
    intent = llm_infer_intent(pull_request_code, original_code)
    
    # Step 2: Create mutants (fault-injected versions)
    mutants = []
    for fault_type in ['off-by-one', 'null-pointer', 'logic-flip']:
        mutant = inject_fault(original_code, fault_type)
        mutants.append(mutant)
    
    # Step 3: Generate tests that catch the mutants
    tests = []
    for mutant in mutants:
        test = llm_generate_test(pull_request_code, mutant, intent)
        tests.append(test)
    
    # Step 4: Run tests and filter false positives
    results = run_tests(tests, pull_request_code)
    filtered_results = ensemble_filter(results)
    
    return filtered_results

This approach eliminates the need for test maintenance entirely—tests are ephemeral, generated per-change, and discarded after use.

Server infrastructure showing automated test execution pipeline for agentic development Software Concept Art

Why This Matters: The Shift from Coverage to Signal

Traditional testing measures code coverage—a metric that correlates poorly with actual bug detection. JiTTests flip the model: they optimize for test signal value.

AspectTraditional TestingCatching JiTTests
Test creationManual, time-consumingAutomatic, LLM-generated
MaintenanceOngoing, brittleZero (tests are ephemeral)
False positivesHigh, especially with flaky testsMinimized via ensemble filtering
AdaptabilityRequires human updatesAuto-adapts to code changes
FocusGeneric code qualitySpecific regression detection

The result: Engineers spend their time on real bugs, not on test upkeep. As agentic coding accelerates, this becomes not just an advantage, but a necessity.

Cloud-based CI/CD system with JIT test generation and mutant testing workflow System Abstract Visual

Limitations and Next Steps

Limitations & Cautions

  • LLM hallucination risk: Generated tests may pass incorrectly if the LLM misinterprets intent. The ensemble assessor mitigates this but doesn't eliminate it.
  • Computational cost: Generating and running per-change tests at scale requires significant infrastructure.
  • Not a silver bullet: JiTTests excel at regression detection but don't replace integration or end-to-end testing for complex workflows.

Where to Go Next

  • Explore how mutant testing can be integrated into your existing CI/CD pipeline.
  • Experiment with LLM-based test generation using open-source models (e.g., CodeLlama, DeepSeek-Coder) for smaller-scale trials.
  • Read more about Meta's broader agentic infrastructure in KernelEvolve: How Meta's AI Agent Automates Kernel Optimization.

Bottom line: Just-in-Time Testing is not a replacement for all testing—it's a powerful new tool designed for the speed and autonomy of agentic development. Adopt it where it shines: catching regressions fast, without the baggage of static suites.


Based on Meta Engineering research. For the original deep dive, see the source article.

This content was drafted using AI tools based on reliable sources, and has been reviewed by our editorial team before publication. It is not intended to replace professional advice.