The Problem: Static Testing in a Dynamic World
Agentic software development—where AI agents write, review, and ship code at unprecedented speed—has broken the traditional testing model. Manual test authoring, brittle static suites, and the endless cycle of false positives are becoming bottlenecks. Engineers spend more time maintaining tests than fixing real bugs.
Enter Just-in-Time Tests (JiTTests): a paradigm shift where tests are no longer permanent residents of your codebase. Instead, they are generated automatically by large language models (LLMs) the moment a pull request is submitted. This approach, pioneered in Meta's research, reimagines testing for the agentic era.
Key Insight: JiTTests don't just automate test execution—they automate test creation itself, adapting to each unique code change.

How Catching JiTTests Work Under the Hood
Catching JiTTests focus specifically on regression detection. Here's the step-by-step flow:
- Code change lands in a pull request.
- The system infers the intention of the change using an LLM.
- It creates mutants—versions of the code with deliberately injected faults—to simulate what could go wrong.
- It generates and runs tests designed to catch those injected faults.
- An ensemble of rule-based and LLM-based assessors filters out false positives.
- Engineers receive a clear, actionable report only when a real bug is found.
# Simplified pseudocode illustrating the JiTTest pipeline
def generate_jit_tests(pull_request_code, original_code):
"""Generate just-in-time tests for a given code change."""
# Step 1: Infer the intent of the change
intent = llm_infer_intent(pull_request_code, original_code)
# Step 2: Create mutants (fault-injected versions)
mutants = []
for fault_type in ['off-by-one', 'null-pointer', 'logic-flip']:
mutant = inject_fault(original_code, fault_type)
mutants.append(mutant)
# Step 3: Generate tests that catch the mutants
tests = []
for mutant in mutants:
test = llm_generate_test(pull_request_code, mutant, intent)
tests.append(test)
# Step 4: Run tests and filter false positives
results = run_tests(tests, pull_request_code)
filtered_results = ensemble_filter(results)
return filtered_results
This approach eliminates the need for test maintenance entirely—tests are ephemeral, generated per-change, and discarded after use.

Why This Matters: The Shift from Coverage to Signal
Traditional testing measures code coverage—a metric that correlates poorly with actual bug detection. JiTTests flip the model: they optimize for test signal value.
| Aspect | Traditional Testing | Catching JiTTests |
|---|---|---|
| Test creation | Manual, time-consuming | Automatic, LLM-generated |
| Maintenance | Ongoing, brittle | Zero (tests are ephemeral) |
| False positives | High, especially with flaky tests | Minimized via ensemble filtering |
| Adaptability | Requires human updates | Auto-adapts to code changes |
| Focus | Generic code quality | Specific regression detection |
The result: Engineers spend their time on real bugs, not on test upkeep. As agentic coding accelerates, this becomes not just an advantage, but a necessity.

Limitations and Next Steps
Limitations & Cautions
- LLM hallucination risk: Generated tests may pass incorrectly if the LLM misinterprets intent. The ensemble assessor mitigates this but doesn't eliminate it.
- Computational cost: Generating and running per-change tests at scale requires significant infrastructure.
- Not a silver bullet: JiTTests excel at regression detection but don't replace integration or end-to-end testing for complex workflows.
Where to Go Next
- Explore how mutant testing can be integrated into your existing CI/CD pipeline.
- Experiment with LLM-based test generation using open-source models (e.g., CodeLlama, DeepSeek-Coder) for smaller-scale trials.
- Read more about Meta's broader agentic infrastructure in KernelEvolve: How Meta's AI Agent Automates Kernel Optimization.
Bottom line: Just-in-Time Testing is not a replacement for all testing—it's a powerful new tool designed for the speed and autonomy of agentic development. Adopt it where it shines: catching regressions fast, without the baggage of static suites.
Based on Meta Engineering research. For the original deep dive, see the source article.