Empirical Research · 80 Runs · 4 Conditions · 10 Tasks

The repository,
redesigned
for AI agents.

Traditional repos are libraries built for human intuition. What if the codebase was an API — with schemas, contracts, and explicit side-effect declarations?

0 Agent-native pass rate
0 Traditional pass rate
+0 Percentage point gain
0 Total experiment runs
scroll

Repos built for humans create friction for agents.

Traditional repo — what the agent sees
📁 src/
📁 controllers/
📄 userController.ts ← CRUD? Auth? Both?
📄 authController.ts
📁 models/
📄 User.ts ← has password field?
📄 Session.ts
📁 routes/
📁 middleware/
📁 utils/
📄 helpers.ts ← ?
📄 README.md ← prose for humans
Agent reads 4–5 files, guesses intent, commits to wrong implementation
Side effects are invisible — "delete user" silently breaks sessions
No signal for "you haven't read the critical constraint yet"
Agent-native repo — what the agent sees
📁 .agent/
📋 MANIFEST.yaml ← capability index
INVARIANTS.md ← bugs pre-annotated
🗺 IMPACT_MAP.yaml ← "change X → fix Y"
📁 src/
📁 user/ domain-organized
📄 user.delete.handler.ts
📄 user.contract.ts
📁 auth/
📄 auth.login.handler.ts
📄 AGENT.md ← machine-optimized entry
MANIFEST tells agent exactly which file handles each capability
INVARIANTS flags known bugs with location + fix direction
IMPACT_MAP exposes cross-module side effects before they bite

Task H: Add auth token refresh endpoint.

claude-haiku-4-5-20251001 — Task H: POST /auth/refresh — disallowedTools: Bash
▸ Traditional Repo
✗ Tests FAILED — 4 failures
▸ Agent-Native Repo
✓ All tests PASSED

The traditional agent committed to an implementation after reading 4 files. It didn't know about session token invalidation. The agent-native agent read 19 files — including IMPACT_MAP and INVARIANTS — then made exactly the right changes across 5 files.

— Experiment data, Task H, 2 runs averaged

80 runs. 4 conditions. Real numbers.

Traditional pass rate
0%
11 / 20 runs passing
Agent-native pass rate
0%
17 / 20 runs passing
Token cost per correct answer
≈ 0%
344K → 345K (55% more correct answers)
Test pass rate by task — Traditional vs Agent-Native (%)
Task A
PATCH email
50%
traditional
100%
agent-native
Task B
Fix session bug
50%
100%
Task C
Input validation
0%
0%
Task D
GET /users list
100%
100%
Task E
PATCH password
100%
100%
Task F
Session expiry
100%
100%
Task G
Email search
100%
100%
Task H
Auth refresh ★
0%
100%
Task I
Req. logging
50%
100%
Task J
Soft delete ★
0%
100%
Traditional
Agent-Native (final)
★ = complete reversal (0% → 100%)
Ablation Study — how the design was refined
Condition Files Pass Rate Avg Tokens Key change
Traditional 55% 189K baseline control
AN-Baseline 4 80% 301K MANIFEST + INVARIANTS + IMPACT_MAP + AGENT.md
AN-Extended 11 80% 343K +7 files: file index, routes, patterns, concepts… regressions introduced
AN-Refined ✦ 5 85% 293K baseline + TEST_CONTRACTS + step-by-step fix instructions
✦ Final design. More metadata ≠ better performance — the optimal set contains only what an agent cannot infer from reading the code alone.

Why more reads leads to fewer failures.

Premature Commit
Traditional agents read 4–7 files then commit to an implementation — before understanding cross-module side effects. Agent-native repos force front-loaded research before any edit happens.
Task H traditional: read=4 → edit=2 → FAIL
Task H agent-native: read=19 → edit=5 → PASS
Read-per-edit ratio: 2.0 vs 3.8
🗺
Unknown Unknowns → Known
Traditional agents don't know what they don't know. IMPACT_MAP.yaml converts invisible cross-module dependencies into explicit declarations the agent reads before deciding what to change.
Task J needs 3 coordinated files.
Traditional: edit=6 same file → FAIL
Agent-native: edit=5 diff. files → PASS
📐
Structure as Convention
Seeing user.create.handler.ts, user.get.handler.ts tells the agent: create a new isolated file, don't bloat the controller. Structure silently instructs the correct implementation pattern.
Traditional Write ratio: 4.1%
Agent-native Write ratio: 17.4%
(4× more likely to create new files)
🎯
Test Contracts Prevent Over-Engineering
Knowing exactly what a test asserts — before writing a line of code — eliminates the failure mode where agents read constraints and over-implement. TEST_CONTRACTS.yaml was identified in the ablation as the single highest-value addition.
Task E AN-Baseline: read=10, edit=4 → FAIL
Task E AN-Refined: read=13, edit=3 → PASS
Precise scope → correct first attempt
📋
Fix Ordering, Not Just Fix Location
Saying "call deleteByUserId()" causes agents to call a method that doesn't exist. Saying "Step 1: add this method — it does not exist yet. Step 2: call it." recovers 50pp. Implementation order is a first-class metadata concern.
Task B AN-Extended: run2 called non-existent method → FAIL
Task B AN-Refined: both runs followed steps → PASS
100% → 50% → 100% across conditions

The original hypothesis was wrong: agent-native repos don't reduce tool calls — they increase them. But the increase is purposeful. Each extra read returns structured metadata with 5–10× more decision-relevant information than reading a source file. The result: correct first attempts, not fewer attempts.

— Analysis, mechanisms 1–3
Token investment timing
❌ Traditional model
Read 5 files (cheap)
→ Commit with incomplete info
→ Wrong implementation
→ Tests FAIL
0 value for all tokens spent
✓ Agent-native model
Read .agent/ files (expensive)
→ Commit with full context
→ Correct implementation
→ Tests PASS
Full value returned
Cost per correct implementation: Traditional 339,480 tokens  →  Agent-native 371,165 tokens  (only +9.3% more expensive per correct answer)

Start any new project
agent-native.

One Claude Code skill generates the full .agent/ layer — MANIFEST, INVARIANTS, IMPACT_MAP — scaffolded to your project type.

/init-agent-repo

Coming soon as a Claude Code skill · Based on experimental findings from 40 real agent runs

Priority 1
.agent/INVARIANTS.md
Pre-annotate known bugs with location + fix direction. Task B: 50%→100%
Priority 2
.agent/MANIFEST.yaml
Capability index: handler + side_effects + known_issues. Task H: 0%→100%
Priority 3
.agent/IMPACT_MAP.yaml
Per-file impact declarations. Task J: 0%→100% on multi-file tasks
Priority 4
Domain structure + naming
user.delete.handler.ts tells agent more than userController.ts