Claude Code Source Architecture Analysis
Leaked scope: 1,884 TypeScript files · 512,664 lines of code Runtime: Bun · UI: React/Ink · API: @anthropic-ai/sdk
One-Line Definition
Claude Code is a Tool-call driven agent loop system. User input → LLM decision → tool invocation → result returned to LLM → loop, until the LLM considers the task complete.
Six Subsystem Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ 01 · Entry Layer main.tsx │
│ User input in terminal → CLI parsing → React/Ink renderer │
│ Parallel prefetch on startup: MDM | Keychain | API | Flags │
└───────────────────────────┬─────────────────────────────────┘
│ user message
▼
┌─────────────────────────────────────────────────────────────┐
│ 02 · Query Engine QueryEngine.ts │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ while(true) Tool-call main loop │ │
│ │ compress → callModel → execute tools → collect │ │
│ └─────────────────────────────────────────────────────┘ │
└────────┬──────────────┬──────────────┬──────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 03 · Tools │ │04 · Commands│ │05 · Perms │
│ tools/ │ │ commands/ │ │ hooks/ │
│ ~40 tools │ │ ~50 cmds │ │ 4 modes │
└──────┬──────┘ └─────────────┘ └─────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 06 · Multi-Agent coordinator/ │
│ Sub-agent spawning (AgentTool) · Messaging (SendMessageTool)│
│ Orchestration (AgentRegistry / MessageRouter / Lifecycle) │
└─────────────────────────────────────────────────────────────┘
Tech Stack
| Layer | Technology | Key Points |
|---|---|---|
| Language | TypeScript 5.x strict | Runtime Zod validation + static types |
| Runtime | Bun (not Node.js) | Native TS execution, cold start <50ms, built-in SQLite |
| Terminal UI | React 18 + Ink 4 | Virtual DOM diff → ANSI escape sequences, streaming render |
| CLI parsing | Commander.js | Subcommands / options / help generation |
| Schema validation | Zod v4 | Runtime validation of tool input parameters |
| API client | @anthropic-ai/sdk | Official SDK, streaming SSE |
| Code search | ripgrep | Rust implementation, 10x faster than grep |
| Telemetry | OpenTelemetry + gRPC | Lazy-loaded, no cold-start impact |
| Feature flags | GrowthBook + Bun bundle | Compile-time elimination — dead code physically removed from production bundle |
Core Data Flow
User input
↓
processUserInput() ← slash command intercept, @file injection, memory attachments
↓
recordTranscript() ← writes ~/.claude/sessions/<id>.jsonl (supports /resume)
↓
query() main loop
↓
[compress check] → [callModel SSE] → [parallel tool execution] → [permission check] → [collect results]
↓
needsFollowUp?
yes → append tool_result, continue loop
no → produce final result, Ink renders output
01 · Entry Layer — main.tsx
File:
src/main.tsx(803K lines, includes all React components) Responsibilities: CLI parsing · parallel prefetch · React/Ink renderer init · session startup
Role
The entry layer is the user’s first contact point with the system. It does two things:
- Parse user intent: Convert command-line arguments into configuration via Commander.js
- Set the stage: Initialize QueryEngine, inject tools/commands/permission config, launch the React/Ink renderer
The entry layer contains no business logic — it is a pure assembly layer.
Startup Flow
User runs claude command
│
▼
Commander.js parses arguments
├─ claude → interactive mode (REPL)
├─ claude "prompt" → single-shot execution mode
├─ claude --resume → resume previous session
├─ claude --model xxx → specify model
└─ claude /cmd args → execute slash command directly
│
▼
Parallel prefetch phase (critical performance optimization)
│
▼
Assemble QueryEngineConfig
│
▼
React/Ink renderer starts
│
▼
Enter interactive loop (user can type)
Parallel Prefetch Optimization
The most important engineering design in the entry layer — turns serial startup into parallel, compressing cold start from ~400ms to ~100ms.
At startup, four independent I/Os fire simultaneously:
① MDM config read
Policy config for enterprise-managed devices (macOS MDM)
Determines which features are disabled, which servers are accessible
② macOS Keychain prefetch
Reads API keys and OAuth tokens stored in Keychain
Avoids Keychain dialog delay on first request
③ Anthropic API pre-connect
Establishes TCP connection to api.anthropic.com in advance
Reduces network handshake latency when user sends first message
④ GrowthBook feature flag init
Fetches feature flag config for current account
Determines which experimental features (VOICE_MODE, DAEMON, etc.) are enabled
React/Ink: Why Use React in a Terminal
Ink is React’s terminal adapter. It translates Virtual DOM diff results into ANSI escape sequences (cursor position, color, clear screen) and outputs them to the terminal.
React component tree
↓ Virtual DOM diff
Ink renderer
↓ translate
ANSI escape sequences
↓ output
Terminal display
Core value: When the LLM outputs a token, just setState(prev => prev + newToken) — React diff calculates the minimal change, Ink only updates the parts of the terminal that actually changed, without redrawing the entire screen.
Rendering hierarchy:
<App> ← top level, holds QueryEngine reference
<ConversationHistory /> ← history message list (scrollable)
<StreamingOutput /> ← currently streaming content (real-time updates)
<ToolExecutionStatus /> ← tool execution status (parallel progress bars)
<PermissionPrompt /> ← permission confirmation dialog (blocking)
<StatusBar /> ← bottom: model name / token usage / cost
<InputBox /> ← user input field
</App>
QueryEngineConfig Assembly
The entry layer’s core responsibility is assembling the complete QueryEngine configuration (dependency injection):
QueryEngineConfig = {
tools: ~40 tool instances loaded from tools/
commands: ~50 commands loaded from commands/
mcpClients: MCP server connections from config file
skills: skills dynamically loaded from ~/.claude/skills/
canUseTool: permission check callback (injected from permission system)
model: current model (can be overridden by --model)
maxTurns: max tool call rounds (unlimited by default)
maxBudgetUsd: USD cost ceiling (unlimited by default)
}
| Principle | How It Manifests |
|---|---|
| Entry layer only assembles | No business logic, only init and wiring |
| Parallel first | All independent I/Os fired concurrently |
| Lazy-load heavy modules | OTel, gRPC loaded via dynamic import() on demand |
| Dependency injection | Tool sets, permission logic, state management all injected |
02 · Query Engine — QueryEngine.ts
File:
src/services/QueryEngine.ts(46,630 lines — the most critical file in the system) Responsibilities: LLM API calls · Tool-call main loop · context compression · error recovery · token billing
Role
QueryEngine is the central nervous system of Claude Code. All other subsystems serve it — the tool system provides callable capabilities, the permission system provides safety constraints, the command system provides shortcuts. QueryEngine itself does one thing:
Drive the conversation loop between LLM and tools until the task is complete.
Tool-Call Main Loop: Six Phases of a Turn
Turn begins
│
├─ Phase 1: Context compression check
│ Check if message history is approaching token limit
│ Trigger compression strategy based on pressure level
│
├─ Phase 2: Call LLM (streaming)
│ Send message history to Claude API
│ Receive response as SSE stream
│ Tool calls start executing in parallel during streaming (no waiting)
│
├─ Phase 3: Error handling & recovery
│ Handle prompt_too_long (trigger compress + retry)
│ Handle max_output_tokens (3-level escalation recovery)
│ Handle model failure (fall back to fallbackModel)
│
├─ Phase 4: Tool result collection
│ Wait for all parallel tool executions to complete
│ Check each tool call through permission system
│ Collect legitimate tool results
│
├─ Phase 5: Attachment injection
│ Read relevant memory chunks from persistent memory
│ Dynamically discover and inject matching skill templates
│
└─ Phase 6: Turn and budget check
turnCount++
Exceeded maxTurns? → stop
Exceeded maxBudgetUsd? → stop
needsFollowUp? → continue or exit
Streaming Parallel Tool Execution
Claude Code starts executing tools while the LLM is still streaming output:
Traditional serial (wasted wait time):
[──── LLM output 50ms ────][Tool A 30ms][Tool B 20ms] = 100ms
Claude Code parallel (overlapping execution):
[──── LLM output 50ms ────]
[──Tool A 30ms──] (started halfway through LLM output)
[──Tool B 20ms─]
Total = max(50, 50+20) = 70ms 30% faster
5-Level Context Compression Pipeline
Context usage
100% ─── API rejection boundary ───────────────────────────
95% ─── Level 5: Autocompact ──────────────────────────────
Full session summary, replaces all history messages
85% ─── Level 4: Context Collapse ─────────────────────────
Collapses early conversation turns into a summary block
70% ─── Level 3: Microcompact ─────────────────────────────
Deduplicates repeated file edits, removes intermediate state
55% ─── Level 2: Snip Compact ─────────────────────────────
Removes oldest N turns of messages
0% ─── Level 1: Content replacement (always running) ─────
Truncates single tool results exceeding size threshold
| Level | Info Loss | Speed | Use Case |
|---|---|---|---|
| 1 · Content replacement | Minimal (truncates oversized single output) | Instant | Any time |
| 2 · Snip | Loses old history | Fast | When history doesn’t matter |
| 3 · Microcompact | Loses file edit intermediate states | Fast | After heavy file operations |
| 4 · Collapse | History detail folded into summary | Medium (needs LLM) | Early turns no longer relevant |
| 5 · Autocompact | Full history replaced with summary | Slow (needs LLM) | Last resort when necessary |
3-Level Output Recovery
When LLM output is truncated (stop_reason: max_tokens):
First truncation
→ Slot upgrade: raise output token limit from 8k to 64k
→ If still truncated, enter multi-turn continuation (max 3 attempts)
→ All 3 failed → return completed portion, mark error
Model Fallback & Orphan Cleanup
When the primary model crashes mid-stream, “orphan” tool_use entries appear in message history. The fix (Tombstone mode):
Original (invalid):
assistant: { tool_use: { id: 'X', name: 'BashTool' } }
(missing corresponding tool_result)
Fixed (valid):
assistant: { tool_use: { id: 'X', name: 'BashTool' } }
user: { tool_result: { id: 'X', is_error: true, content: '[interrupted]' } }
→ switch to fallbackModel and retry
03 · Tool System — tools/
Directory:
src/tools/(~40 tools) Responsibilities: Give LLM the ability to interact with the real world — each tool is a self-contained independent module
Unified Tool Interface
Each tool must provide:
name Tool name (LLM uses this name to call it)
description Natural language description (LLM decides when to use it based on this)
inputSchema Input parameter definition (Zod Schema, runtime validated)
execute() Actual execution logic
requiresPermission() Whether user confirmation is needed
permissionDescription() Permission prompt text
Tool Categories
Category 1: File Operations (4 core tools)
FileReadTool → read first (understand current state)
↓
FileEditTool → precise edit (most common, old_string → new_string)
FileWriteTool → full write (create new or completely rewrite)
BashTool → bulk operations (mv, cp, mkdir, etc.)
Key constraint of FileEditTool: must read the file with FileReadTool before editing.
Category 2: Search (4 tools)
| Tool | Function | Notes |
|---|---|---|
| GlobTool | File path pattern matching | Sorted by modification time, no permission needed |
| GrepTool | Content search | Based on ripgrep, 10x faster than grep |
| WebSearchTool | Web search | Returns summaries + URL list |
| WebFetchTool | Web page content fetch | Auto-converts to Markdown, supports maxLength |
Category 3: Agent (2 tools)
- AgentTool — Creates an independent Bun subprocess running a full QueryEngine to execute subtasks
- SendMessageTool — Send messages to created sub-agents (sync and async modes)
Category 4: Task Management (6 tools)
Task state flow: pending → in_progress → completed / cancelled
TaskCreate / TaskUpdate / TaskList / TaskGet / TaskOutput / TaskStop
Category 5: Protocol Integration (2 tools)
- MCPTool — Dynamically proxies calls to any connected MCP server tools
- LSPTool — Connects to local language server, provides go-to-definition, find-references, and other IDE capabilities
Category 6: Mode Control (4 tools)
- EnterPlanModeTool / ExitPlanModeTool — Plan mode (read ops auto-approved, write ops show plan only)
- EnterWorktreeTool / ExitWorktreeTool — Git Worktree sandbox mode
Category 7: Notebooks (2 tools)
- NotebookReadTool — Parse
.ipynbformat, return code + execution output - NotebookEditTool — Edit specific cell without rewriting the entire notebook
Tool Permission Declarations
Low-permission tools (auto-approved):
GlobTool, GrepTool, FileReadTool, WebSearchTool
→ Read-only, change no state
Medium-permission tools (require confirmation in default mode):
FileWriteTool, FileEditTool, AgentTool
→ Have persistent side effects, but predictable
High-permission tools (must confirm every time):
BashTool
→ Can execute arbitrary commands, risk unpredictable
04 · Command System — commands/
Directory:
src/commands/(~50 slash commands) Responsibilities: User-triggered via/command, executed directly, bypasses LLM
Command Intercept Timing
User types "/compact"
↓
processUserInput() detects leading "/"
↓
Finds matching command handler in commands/
↓
Execute directly, never enters QueryEngine main loop
Tools vs Commands: The Essential Difference
Tools (tools/):
Caller: LLM (via tool_use API)
Trigger: LLM decides when to call based on understanding
Examples: BashTool, FileReadTool, GrepTool
Commands (commands/):
Caller: User (via /command syntax)
Trigger: User explicitly triggers
Examples: /commit, /compact, /cost
Tools are Claude’s hands — Claude decides when to reach out. Commands are buttons on a remote control — the user decides when to press.
Command Categories
Category 1: Git Workflow Commands
| Command | Function |
|---|---|
/commit | Read git diff --staged → LLM generates commit message → user confirms → execute |
/commit-push-pr | One-click commit + push + create PR (PR description auto-generated by LLM) |
/pr | Create PR only, analyze git diff main...HEAD for standard description |
Category 2: Code Quality Commands
/review— AI review, outputs structured report (issues, severity, suggestions)/ultrareview— Multi-dimensional deep review (security, performance, maintainability, logic)/autofix-pr— Auto-detect issues → generate fix → create PR
Category 3: Context Management Commands
/context → Check current state (message count, token usage, compression history)
/compact → Too much history but want to preserve semantics (triggers Level 5 Autocompact)
/clear → Completely change topic, discard all history
Category 4: Configuration & Integration Commands
/config— Runtime configuration (switch model, modify maxTurns, toggle thinking mode)/mcp— Manage MCP server connections (list/add/remove/restart)/memory— View and manage persistent memory/permissions— View current permission config and whitelist
Category 5: Session Management Commands
/resume— Restore previous session (reads.jsonlto rebuild message history)/cost— Full cost analysis (including sub-agent costs)/status— System state snapshot (turn count, active sub-agents, task queue)
Category 6: Skill Commands (Dynamic Extension)
~/.claude/skills/
├── review-security.md → triggers /review-security
├── generate-tests.md → triggers /generate-tests
└── refactor-to-ts.md → triggers /refactor-to-ts
When a skill command is triggered, the template content is injected as a system attachment to guide LLM behavior (rather than executing logic directly).
05 · Permission System — hooks/toolPermission/
Directory:
src/hooks/toolPermission/Responsibilities: Safety check before each tool execution — decide whether to allow
Four Permission Modes
| Mode | Use Case | Behavior | Automation |
|---|---|---|---|
default | Daily development | Confirmation prompt for dangerous operations | Lowest |
plan | Read-only analysis | Read ops auto-approved, write ops show plan only | Medium |
auto | Batch processing | Classifier assesses risk, low-risk auto-approved | High |
bypassPermissions | CI/CD pipelines | Skip all permission checks | Highest |
wrappedCanUseTool: Full Decision Flow
Tool call request arrives
│
[Check 1] bypassPermissions mode? yes → approve
│
[Check 2] In permanent approval whitelist? yes → approve
│
[Check 3] plan mode?
yes, read-only tool → approve
yes, write tool → show plan, do not execute
│
[Check 4] auto mode?
yes → classifier evaluates: low risk → approve, high risk → fall back to default
│
[Check 5] default mode: show interactive confirmation
Approve (once) → execute
Approve (always) → execute + add to permanent whitelist
Deny → record in permissionDenials[]
Permanent Approval Whitelist
Storage: ~/.claude/approvals.json
Structure: Map<toolName, Set<serializedInput>>
Example:
{
"BashTool": ["npm test", "npm run build", "git status"],
"FileWriteTool": ["/project/src/utils.ts"],
}
Whitelist is exact match — npm test does not cover npm test --watch.
auto Mode Risk Classifier
Evaluation dimensions:
Tool type
Read-only tools (Glob, Grep, FileRead) → low risk
Write tools (FileWrite, FileEdit) → medium risk
Execution tools (BashTool) → high risk (default)
Parameter content analysis (BashTool only)
rm -rf ... → extreme risk
sudo ... → high risk
curl | bash → extreme risk
git status → low risk
npm test → low risk
Directory scope: target outside allowedDirectories → risk escalates one level
Permission System Decoupled from QueryEngine
Permission logic is injected via callback — QueryEngine doesn’t know the implementation:
// In QueryEngineConfig:
canUseTool: (tool, input) => Promise<boolean>
// This means:
// In tests, inject a mock that always returns true
// Different environments (CLI, IDE, CI) inject different permission logic
06 · Multi-Agent Coordination — coordinator/
Directory:
src/coordinator/·src/tools/AgentTool/·src/tools/SendMessageTool/Responsibilities: Sub-agent spawning, inter-agent communication, lifecycle orchestration
Architecture Overview
Root Agent
QueryEngine main loop
mutableMessages (main context)
totalUsage (aggregates full-tree cost)
│
│ calls AgentTool
▼
┌──────────────── coordinator ─────────────────┐
│ AgentRegistry registry: id → process │
│ LifecycleManager create / monitor / reap │
│ MessageRouter SendMessageTool routing │
│ SharedStateProxy read-only state sharing │
└──────┬───────────────────────┬───────────────┘
│ │
▼ ▼
Sub-agent A (Bun process) Sub-agent B (Bun process)
Independent QueryEngine Independent QueryEngine
Independent message history Independent message history
Restricted tool set Restricted tool set
Why Processes Instead of Threads
Process approach (what Claude Code uses):
Each sub-agent = independent Bun subprocess
→ Memory naturally isolated, no locking needed
→ Subprocess crash doesn't affect parent
→ OS can set CPU/memory limits per process
→ True parallelism across CPU cores
Tool Set Inheritance & Restriction
Sub-agent tool sets are explicitly injected by the parent, following least-privilege:
Parent creates sub-agent with:
tools: ['GlobTool', 'GrepTool', 'FileReadTool']
Sub-agent:
GlobTool ✓ GrepTool ✓ FileReadTool ✓
BashTool ✗ FileWriteTool ✗ AgentTool ✗
Nesting depth limit: Max 3 levels of nesting. At level 3, AgentTool is forcibly removed to prevent infinite recursive process creation.
Inter-Agent Communication: SendMessageTool
All communication goes through coordinator — agents don’t reference each other directly:
Agent A calls SendMessageTool(to='agent_B_id', message='...')
→ coordinator.MessageRouter receives
→ looks up AgentRegistry, finds agent_B's process handle
→ writes message to agent_B's stdin
→ agent_B's LLM processes, response streams back to coordinator
→ coordinator returns response as tool_result to agent A
Special routing targets:
to='__parent__'— send to parent agentto='*'— broadcast to all sibling agentsto='agent_xxx_id'— precise point-to-point
Three Execution Patterns
Serial decomposition:
Root → sub-A (scan structure) → sub-B (analyze deps) → sub-C (generate report)
Parallel fan-out:
Root → sub-A (review module 1) ┐
→ sub-B (review module 2) ├→ aggregate results
→ sub-C (review module 3) ┘
Total time ≈ slowest one, not sum of all three
Hierarchical delegation:
Root: "refactor entire codebase"
→ sub-A: "refactor frontend"
→ sub-sub-A1: "refactor components"
→ sub-sub-A2: "refactor styles"
→ sub-B: "refactor backend"
Full-Tree Token & Cost Tracking
/cost shows full-tree cost, not just root agent:
Total cost: $0.127
Root agent itself: $0.031
Sub-agent A (with tree): $0.063
Sub-agent B: $0.033
Fault Isolation & Resilience
When a sub-agent crashes:
LifecycleManager detects abnormal exit
→ AgentTool receives failure result
→ Returns error tool_result to root QueryEngine
→ Root LLM decides: retry / degrade / report to user
→ Root agent continues running (completely unaffected)
Heartbeat detection: Every 10 seconds. No response within 5 seconds → declared dead, force cleanup, prevents zombie processes.