Repo4Agent — The Repository, Redesigned for AI Agents

The Problem

Repos built for humans create friction for agents.

Traditional repo — what the agent sees

📁 src/

📁 controllers/

📄 userController.ts ← CRUD? Auth? Both?

📄 authController.ts

📁 models/

📄 User.ts ← has password field?

📄 Session.ts

📁 routes/

📁 middleware/

📁 utils/

📄 helpers.ts ← ?

📄 README.md ← prose for humans

✗ Agent reads 4–5 files, guesses intent, commits to wrong implementation

✗ Side effects are invisible — "delete user" silently breaks sessions

✗ No signal for "you haven't read the critical constraint yet"

Agent-native repo — what the agent sees

📁 .agent/

📋 MANIFEST.yaml ← capability index

⚡ INVARIANTS.md ← bugs pre-annotated

🗺 IMPACT_MAP.yaml ← "change X → fix Y"

📁 src/

📁 user/ domain-organized

📄 user.delete.handler.ts

📄 user.contract.ts

📁 auth/

📄 auth.login.handler.ts

📄 AGENT.md ← machine-optimized entry

✓ MANIFEST tells agent exactly which file handles each capability

✓ INVARIANTS flags known bugs with location + fix direction

✓ IMPACT_MAP exposes cross-module side effects before they bite

Experiment Results

80 runs. 4 conditions. Real numbers.

Traditional pass rate

11 / 20 runs passing

Agent-native pass rate

17 / 20 runs passing

Token cost per correct answer

≈ 0%

344K → 345K (55% more correct answers)

Test pass rate by task — Traditional vs Agent-Native (%)

Task A
PATCH email

50%

traditional

100%

agent-native

Task B
Fix session bug

50%

100%

Task C
Input validation

Task D
GET /users list

100%

Task E
PATCH password

100%

Task F
Session expiry

100%

Task G
Email search

100%

Task H
Auth refresh ★

100%

Task I
Req. logging

50%

100%

Task J
Soft delete ★

100%

Traditional

Agent-Native (final)

★ = complete reversal (0% → 100%)

Ablation Study — how the design was refined

Condition	Files	Pass Rate	Avg Tokens	Key change
Traditional	—	55%	189K	baseline control
AN-Baseline	4	80%	301K	MANIFEST + INVARIANTS + IMPACT_MAP + AGENT.md
AN-Extended	11	80%	343K	+7 files: file index, routes, patterns, concepts… regressions introduced
AN-Refined ✦	5	85%	293K	baseline + TEST_CONTRACTS + step-by-step fix instructions

✦ Final design. More metadata ≠ better performance — the optimal set contains only what an agent cannot infer from reading the code alone.

The Mechanism

Why more reads leads to fewer failures.

⏱

Premature Commit

Traditional agents read 4–7 files then commit to an implementation — before understanding cross-module side effects. Agent-native repos force front-loaded research before any edit happens.

Task H traditional: read=4 → edit=2 → FAIL
Task H agent-native: read=19 → edit=5 → PASS
Read-per-edit ratio: 2.0 vs 3.8

🗺

Unknown Unknowns → Known

Traditional agents don't know what they don't know. IMPACT_MAP.yaml converts invisible cross-module dependencies into explicit declarations the agent reads before deciding what to change.

Task J needs 3 coordinated files.
Traditional: edit=6 same file → FAIL
Agent-native: edit=5 diff. files → PASS

📐

Structure as Convention

Seeing user.create.handler.ts, user.get.handler.ts tells the agent: create a new isolated file, don't bloat the controller. Structure silently instructs the correct implementation pattern.

Traditional Write ratio: 4.1%
Agent-native Write ratio: 17.4%
(4× more likely to create new files)

🎯

Test Contracts Prevent Over-Engineering

Knowing exactly what a test asserts — before writing a line of code — eliminates the failure mode where agents read constraints and over-implement. TEST_CONTRACTS.yaml was identified in the ablation as the single highest-value addition.

Task E AN-Baseline: read=10, edit=4 → FAIL
Task E AN-Refined: read=13, edit=3 → PASS
Precise scope → correct first attempt

📋

Fix Ordering, Not Just Fix Location

Saying "call deleteByUserId()" causes agents to call a method that doesn't exist. Saying "Step 1: add this method — it does not exist yet. Step 2: call it." recovers 50pp. Implementation order is a first-class metadata concern.

Task B AN-Extended: run2 called non-existent method → FAIL
Task B AN-Refined: both runs followed steps → PASS
100% → 50% → 100% across conditions

The original hypothesis was wrong: agent-native repos don't reduce tool calls — they increase them. But the increase is purposeful. Each extra read returns structured metadata with 5–10× more decision-relevant information than reading a source file. The result: correct first attempts, not fewer attempts.

— Analysis, mechanisms 1–3

Token investment timing

❌ Traditional model

Read 5 files (cheap)
→ Commit with incomplete info
→ Wrong implementation
→ Tests FAIL
→ 0 value for all tokens spent

✓ Agent-native model

Read .agent/ files (expensive)
→ Commit with full context
→ Correct implementation
→ Tests PASS
→ Full value returned

Cost per correct implementation: Traditional 339,480 tokens → Agent-native 371,165 tokens (only +9.3% more expensive per correct answer)

Start any new project
agent-native.

One Claude Code skill generates the full .agent/ layer — MANIFEST, INVARIANTS, IMPACT_MAP — scaffolded to your project type.

/init-agent-repo

Coming soon as a Claude Code skill · Based on experimental findings from 40 real agent runs

Priority 1

.agent/INVARIANTS.md

Pre-annotate known bugs with location + fix direction. Task B: 50%→100%

Priority 2

.agent/MANIFEST.yaml

Capability index: handler + side_effects + known_issues. Task H: 0%→100%

Priority 3

.agent/IMPACT_MAP.yaml

Per-file impact declarations. Task J: 0%→100% on multi-file tasks

Priority 4

Domain structure + naming

user.delete.handler.ts tells agent more than userController.ts

The repository,
redesigned
for AI agents.

Repos built for humans create friction for agents.

Task H: Add auth token refresh endpoint.

80 runs. 4 conditions. Real numbers.

Why more reads leads to fewer failures.

The repository,redesignedfor AI agents.

Repos built for humans create friction for agents.

Task H: Add auth token refresh endpoint.

80 runs. 4 conditions. Real numbers.

Why more reads leads to fewer failures.

The repository,
redesigned
for AI agents.