AGENTS.md: Research & Best Practices
AGENTS.md makes a measurable difference when developing with agentic AI—but only when done right. Poor implementation can actively harm performance.
What the Research Shows
| Finding | Impact | Source |
|---|---|---|
| 28.64% faster runtime | Median wall-clock execution time reduced | Lulla et al. (ICSE JAWs 2026) |
| 16.58% lower token consumption | Output tokens reduced | Lulla et al. |
| 100% accuracy | For Next.js 16 APIs vs 79% with skills | Vercel research |
| +5.19% accuracy | With automated instruction optimization | Arize AI |
| -2–3% success rate | With auto-generated context files | ETH Zurich study |
| +20% inference cost | With unnecessary context | ETH Zurich |
When It Works
Human-written AGENTS.md files with non-discoverable information deliver real gains:
- Tooling gotchas: "Use
uvinstead of pip" — agents used it 1.6× per task when mentioned vs <0.01 when not - Framework updates: Next.js 16 APIs not in training data
- Project-specific landmines: "Don't refactor the auth module — it uses custom middleware"
- Command patterns: File-scoped commands that save minutes per task
Vercel found that a compressed 8KB docs index in AGENTS.md achieved 100% accuracy on Next.js 16 tasks—outperforming their "skills" approach which maxed out at 79%.
When It Fails
Auto-generated or bloated files actively hurt performance:
- Redundant information: LLM-generated context files reduced success by 2–3% and increased costs 20%+
- Cognitive load: Unnecessary requirements make tasks harder—agents follow instructions but waste reasoning tokens
- Anchoring effect: Mentioning legacy patterns biases agents toward outdated approaches
- "Lost in the Middle": Long context degrades performance regardless of relevance
The ETH Zurich study found that when they stripped all documentation from repos, auto-generated files suddenly helped (+2.7%)—proving the problem is redundancy, not the format itself.
What Belongs in AGENTS.md
Keep it minimal. Every line should pass this test: "Can the agent discover this by reading the code?" If yes, delete it.
Do Include
- Tooling specifics not inferable from code (
uv,pnpm, custom test runners) - Version requirements that differ from latest
- "Landmines"—things that look right but break
- File-scoped command patterns
- MCP server configurations
- Permission boundaries
Don't Include
- Codebase overviews (agents can list directories)
- Tech stack descriptions (inferable from package files)
- Style guides (unless non-obvious)
- Anything already in README
Emerging Best Practices
- Hierarchical files: Place AGENTS.md at module level, not just root
- Compressed indexes: Use minimal pointers to retrievable docs (Vercel's 8KB approach)
- Task-specific loading: Route agents to focused context based on task type
- Automated optimization: Use meta-prompting to refine instructions (+5.19% accuracy)
- Version control: Treat like code—PRs, reviews, changelogs
The Bottom Line
AGENTS.md is not magic, but it's not useless. It's a precision tool:
- With human-written, minimal, non-discoverable info: Significant gains (28% faster, 16% cheaper, 100% accuracy on specific tasks)
- With auto-generated or bloated content: Active harm (worse success rates, 20%+ higher costs)
The file works when it compensates for knowledge gaps—things the agent genuinely can't figure out on its own. Everything else is noise that competes with the actual task.
How AgentMD Helps
AgentMD validates that your AGENTS.md is actually useful:
- Parse & validate — Catch format errors, missing sections, unsafe commands
- Score — Agent-readiness score surfaces quality issues
- Execute — Run the spec and verify it works
- Governance — Audit trails, approval workflows, permission boundaries
Quality beats quantity. Validation prevents degradation. See Agentic AI Best Practices and Parse & Validate for more.