Claude Code Source Architecture Analysis

Leaked scope: 1,884 TypeScript files · 512,664 lines of code Runtime: Bun · UI: React/Ink · API: @anthropic-ai/sdk


One-Line Definition

Claude Code is a Tool-call driven agent loop system. User input → LLM decision → tool invocation → result returned to LLM → loop, until the LLM considers the task complete.


Six Subsystem Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                  01 · Entry Layer  main.tsx                  │
│  User input in terminal → CLI parsing → React/Ink renderer  │
│  Parallel prefetch on startup: MDM | Keychain | API | Flags  │
└───────────────────────────┬─────────────────────────────────┘
                            │ user message

┌─────────────────────────────────────────────────────────────┐
│               02 · Query Engine  QueryEngine.ts              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  while(true) Tool-call main loop                     │   │
│  │  compress → callModel → execute tools → collect      │   │
│  └─────────────────────────────────────────────────────┘   │
└────────┬──────────────┬──────────────┬──────────────────────┘
         │              │              │
         ▼              ▼              ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 03 · Tools  │ │04 · Commands│ │05 · Perms   │
│  tools/     │ │ commands/   │ │ hooks/      │
│  ~40 tools  │ │  ~50 cmds   │ │ 4 modes     │
└──────┬──────┘ └─────────────┘ └─────────────┘


┌─────────────────────────────────────────────────────────────┐
│               06 · Multi-Agent  coordinator/                 │
│  Sub-agent spawning (AgentTool) · Messaging (SendMessageTool)│
│  Orchestration (AgentRegistry / MessageRouter / Lifecycle)   │
└─────────────────────────────────────────────────────────────┘

Tech Stack

LayerTechnologyKey Points
LanguageTypeScript 5.x strictRuntime Zod validation + static types
RuntimeBun (not Node.js)Native TS execution, cold start <50ms, built-in SQLite
Terminal UIReact 18 + Ink 4Virtual DOM diff → ANSI escape sequences, streaming render
CLI parsingCommander.jsSubcommands / options / help generation
Schema validationZod v4Runtime validation of tool input parameters
API client@anthropic-ai/sdkOfficial SDK, streaming SSE
Code searchripgrepRust implementation, 10x faster than grep
TelemetryOpenTelemetry + gRPCLazy-loaded, no cold-start impact
Feature flagsGrowthBook + Bun bundleCompile-time elimination — dead code physically removed from production bundle

Core Data Flow

User input

processUserInput()     ← slash command intercept, @file injection, memory attachments

recordTranscript()     ← writes ~/.claude/sessions/<id>.jsonl (supports /resume)

query() main loop

[compress check] → [callModel SSE] → [parallel tool execution] → [permission check] → [collect results]

needsFollowUp?
  yes → append tool_result, continue loop
  no  → produce final result, Ink renders output

01 · Entry Layer — main.tsx

File: src/main.tsx (803K lines, includes all React components) Responsibilities: CLI parsing · parallel prefetch · React/Ink renderer init · session startup

Role

The entry layer is the user’s first contact point with the system. It does two things:

  1. Parse user intent: Convert command-line arguments into configuration via Commander.js
  2. Set the stage: Initialize QueryEngine, inject tools/commands/permission config, launch the React/Ink renderer

The entry layer contains no business logic — it is a pure assembly layer.

Startup Flow

User runs claude command


Commander.js parses arguments
  ├─ claude              → interactive mode (REPL)
  ├─ claude "prompt"     → single-shot execution mode
  ├─ claude --resume     → resume previous session
  ├─ claude --model xxx  → specify model
  └─ claude /cmd args    → execute slash command directly


Parallel prefetch phase (critical performance optimization)


Assemble QueryEngineConfig


React/Ink renderer starts


Enter interactive loop (user can type)

Parallel Prefetch Optimization

The most important engineering design in the entry layer — turns serial startup into parallel, compressing cold start from ~400ms to ~100ms.

At startup, four independent I/Os fire simultaneously:

  ① MDM config read
     Policy config for enterprise-managed devices (macOS MDM)
     Determines which features are disabled, which servers are accessible

  ② macOS Keychain prefetch
     Reads API keys and OAuth tokens stored in Keychain
     Avoids Keychain dialog delay on first request

  ③ Anthropic API pre-connect
     Establishes TCP connection to api.anthropic.com in advance
     Reduces network handshake latency when user sends first message

  ④ GrowthBook feature flag init
     Fetches feature flag config for current account
     Determines which experimental features (VOICE_MODE, DAEMON, etc.) are enabled

React/Ink: Why Use React in a Terminal

Ink is React’s terminal adapter. It translates Virtual DOM diff results into ANSI escape sequences (cursor position, color, clear screen) and outputs them to the terminal.

React component tree
      ↓ Virtual DOM diff
Ink renderer
      ↓ translate
ANSI escape sequences
      ↓ output
Terminal display

Core value: When the LLM outputs a token, just setState(prev => prev + newToken) — React diff calculates the minimal change, Ink only updates the parts of the terminal that actually changed, without redrawing the entire screen.

Rendering hierarchy:

<App>                          ← top level, holds QueryEngine reference
  <ConversationHistory />      ← history message list (scrollable)
  <StreamingOutput />          ← currently streaming content (real-time updates)
  <ToolExecutionStatus />      ← tool execution status (parallel progress bars)
  <PermissionPrompt />         ← permission confirmation dialog (blocking)
  <StatusBar />                ← bottom: model name / token usage / cost
  <InputBox />                 ← user input field
</App>

QueryEngineConfig Assembly

The entry layer’s core responsibility is assembling the complete QueryEngine configuration (dependency injection):

QueryEngineConfig = {
  tools:       ~40 tool instances loaded from tools/
  commands:    ~50 commands loaded from commands/
  mcpClients:  MCP server connections from config file
  skills:      skills dynamically loaded from ~/.claude/skills/
  canUseTool:  permission check callback (injected from permission system)
  model:       current model (can be overridden by --model)
  maxTurns:    max tool call rounds (unlimited by default)
  maxBudgetUsd: USD cost ceiling (unlimited by default)
}
PrincipleHow It Manifests
Entry layer only assemblesNo business logic, only init and wiring
Parallel firstAll independent I/Os fired concurrently
Lazy-load heavy modulesOTel, gRPC loaded via dynamic import() on demand
Dependency injectionTool sets, permission logic, state management all injected

02 · Query Engine — QueryEngine.ts

File: src/services/QueryEngine.ts (46,630 lines — the most critical file in the system) Responsibilities: LLM API calls · Tool-call main loop · context compression · error recovery · token billing

Role

QueryEngine is the central nervous system of Claude Code. All other subsystems serve it — the tool system provides callable capabilities, the permission system provides safety constraints, the command system provides shortcuts. QueryEngine itself does one thing:

Drive the conversation loop between LLM and tools until the task is complete.

Tool-Call Main Loop: Six Phases of a Turn

Turn begins

  ├─ Phase 1: Context compression check
  │    Check if message history is approaching token limit
  │    Trigger compression strategy based on pressure level

  ├─ Phase 2: Call LLM (streaming)
  │    Send message history to Claude API
  │    Receive response as SSE stream
  │    Tool calls start executing in parallel during streaming (no waiting)

  ├─ Phase 3: Error handling & recovery
  │    Handle prompt_too_long (trigger compress + retry)
  │    Handle max_output_tokens (3-level escalation recovery)
  │    Handle model failure (fall back to fallbackModel)

  ├─ Phase 4: Tool result collection
  │    Wait for all parallel tool executions to complete
  │    Check each tool call through permission system
  │    Collect legitimate tool results

  ├─ Phase 5: Attachment injection
  │    Read relevant memory chunks from persistent memory
  │    Dynamically discover and inject matching skill templates

  └─ Phase 6: Turn and budget check
       turnCount++
       Exceeded maxTurns? → stop
       Exceeded maxBudgetUsd? → stop
       needsFollowUp? → continue or exit

Streaming Parallel Tool Execution

Claude Code starts executing tools while the LLM is still streaming output:

Traditional serial (wasted wait time):
  [──── LLM output 50ms ────][Tool A 30ms][Tool B 20ms]  = 100ms

Claude Code parallel (overlapping execution):
  [──── LLM output 50ms ────]
            [──Tool A 30ms──]     (started halfway through LLM output)
            [──Tool B 20ms─]
  Total = max(50, 50+20) = 70ms   30% faster

5-Level Context Compression Pipeline

Context usage

100% ─── API rejection boundary ───────────────────────────
 95% ─── Level 5: Autocompact ──────────────────────────────
         Full session summary, replaces all history messages

 85% ─── Level 4: Context Collapse ─────────────────────────
         Collapses early conversation turns into a summary block

 70% ─── Level 3: Microcompact ─────────────────────────────
         Deduplicates repeated file edits, removes intermediate state

 55% ─── Level 2: Snip Compact ─────────────────────────────
         Removes oldest N turns of messages

  0% ─── Level 1: Content replacement (always running) ─────
         Truncates single tool results exceeding size threshold
LevelInfo LossSpeedUse Case
1 · Content replacementMinimal (truncates oversized single output)InstantAny time
2 · SnipLoses old historyFastWhen history doesn’t matter
3 · MicrocompactLoses file edit intermediate statesFastAfter heavy file operations
4 · CollapseHistory detail folded into summaryMedium (needs LLM)Early turns no longer relevant
5 · AutocompactFull history replaced with summarySlow (needs LLM)Last resort when necessary

3-Level Output Recovery

When LLM output is truncated (stop_reason: max_tokens):

First truncation
  → Slot upgrade: raise output token limit from 8k to 64k
  → If still truncated, enter multi-turn continuation (max 3 attempts)
  → All 3 failed → return completed portion, mark error

Model Fallback & Orphan Cleanup

When the primary model crashes mid-stream, “orphan” tool_use entries appear in message history. The fix (Tombstone mode):

Original (invalid):
  assistant: { tool_use: { id: 'X', name: 'BashTool' } }
  (missing corresponding tool_result)

Fixed (valid):
  assistant: { tool_use: { id: 'X', name: 'BashTool' } }
  user:      { tool_result: { id: 'X', is_error: true, content: '[interrupted]' } }
  → switch to fallbackModel and retry

03 · Tool System — tools/

Directory: src/tools/ (~40 tools) Responsibilities: Give LLM the ability to interact with the real world — each tool is a self-contained independent module

Unified Tool Interface

Each tool must provide:

  name              Tool name (LLM uses this name to call it)
  description       Natural language description (LLM decides when to use it based on this)
  inputSchema       Input parameter definition (Zod Schema, runtime validated)
  execute()         Actual execution logic
  requiresPermission() Whether user confirmation is needed
  permissionDescription() Permission prompt text

Tool Categories

Category 1: File Operations (4 core tools)

FileReadTool     → read first (understand current state)

FileEditTool     → precise edit (most common, old_string → new_string)
FileWriteTool    → full write (create new or completely rewrite)
BashTool         → bulk operations (mv, cp, mkdir, etc.)

Key constraint of FileEditTool: must read the file with FileReadTool before editing.

Category 2: Search (4 tools)

ToolFunctionNotes
GlobToolFile path pattern matchingSorted by modification time, no permission needed
GrepToolContent searchBased on ripgrep, 10x faster than grep
WebSearchToolWeb searchReturns summaries + URL list
WebFetchToolWeb page content fetchAuto-converts to Markdown, supports maxLength

Category 3: Agent (2 tools)

  • AgentTool — Creates an independent Bun subprocess running a full QueryEngine to execute subtasks
  • SendMessageTool — Send messages to created sub-agents (sync and async modes)

Category 4: Task Management (6 tools)

Task state flow: pending → in_progress → completed / cancelled

TaskCreate / TaskUpdate / TaskList / TaskGet / TaskOutput / TaskStop

Category 5: Protocol Integration (2 tools)

  • MCPTool — Dynamically proxies calls to any connected MCP server tools
  • LSPTool — Connects to local language server, provides go-to-definition, find-references, and other IDE capabilities

Category 6: Mode Control (4 tools)

  • EnterPlanModeTool / ExitPlanModeTool — Plan mode (read ops auto-approved, write ops show plan only)
  • EnterWorktreeTool / ExitWorktreeTool — Git Worktree sandbox mode

Category 7: Notebooks (2 tools)

  • NotebookReadTool — Parse .ipynb format, return code + execution output
  • NotebookEditTool — Edit specific cell without rewriting the entire notebook

Tool Permission Declarations

Low-permission tools (auto-approved):
  GlobTool, GrepTool, FileReadTool, WebSearchTool
  → Read-only, change no state

Medium-permission tools (require confirmation in default mode):
  FileWriteTool, FileEditTool, AgentTool
  → Have persistent side effects, but predictable

High-permission tools (must confirm every time):
  BashTool
  → Can execute arbitrary commands, risk unpredictable

04 · Command System — commands/

Directory: src/commands/ (~50 slash commands) Responsibilities: User-triggered via /command, executed directly, bypasses LLM

Command Intercept Timing

User types "/compact"

processUserInput() detects leading "/"

Finds matching command handler in commands/

Execute directly, never enters QueryEngine main loop

Tools vs Commands: The Essential Difference

Tools (tools/):
  Caller: LLM (via tool_use API)
  Trigger: LLM decides when to call based on understanding
  Examples: BashTool, FileReadTool, GrepTool

Commands (commands/):
  Caller: User (via /command syntax)
  Trigger: User explicitly triggers
  Examples: /commit, /compact, /cost

Tools are Claude’s hands — Claude decides when to reach out. Commands are buttons on a remote control — the user decides when to press.

Command Categories

Category 1: Git Workflow Commands

CommandFunction
/commitRead git diff --staged → LLM generates commit message → user confirms → execute
/commit-push-prOne-click commit + push + create PR (PR description auto-generated by LLM)
/prCreate PR only, analyze git diff main...HEAD for standard description

Category 2: Code Quality Commands

  • /review — AI review, outputs structured report (issues, severity, suggestions)
  • /ultrareview — Multi-dimensional deep review (security, performance, maintainability, logic)
  • /autofix-pr — Auto-detect issues → generate fix → create PR

Category 3: Context Management Commands

/context    → Check current state (message count, token usage, compression history)
/compact    → Too much history but want to preserve semantics (triggers Level 5 Autocompact)
/clear      → Completely change topic, discard all history

Category 4: Configuration & Integration Commands

  • /config — Runtime configuration (switch model, modify maxTurns, toggle thinking mode)
  • /mcp — Manage MCP server connections (list/add/remove/restart)
  • /memory — View and manage persistent memory
  • /permissions — View current permission config and whitelist

Category 5: Session Management Commands

  • /resume — Restore previous session (reads .jsonl to rebuild message history)
  • /cost — Full cost analysis (including sub-agent costs)
  • /status — System state snapshot (turn count, active sub-agents, task queue)

Category 6: Skill Commands (Dynamic Extension)

~/.claude/skills/
  ├── review-security.md    → triggers /review-security
  ├── generate-tests.md     → triggers /generate-tests
  └── refactor-to-ts.md     → triggers /refactor-to-ts

When a skill command is triggered, the template content is injected as a system attachment to guide LLM behavior (rather than executing logic directly).


05 · Permission System — hooks/toolPermission/

Directory: src/hooks/toolPermission/ Responsibilities: Safety check before each tool execution — decide whether to allow

Four Permission Modes

ModeUse CaseBehaviorAutomation
defaultDaily developmentConfirmation prompt for dangerous operationsLowest
planRead-only analysisRead ops auto-approved, write ops show plan onlyMedium
autoBatch processingClassifier assesses risk, low-risk auto-approvedHigh
bypassPermissionsCI/CD pipelinesSkip all permission checksHighest

wrappedCanUseTool: Full Decision Flow

Tool call request arrives

[Check 1] bypassPermissions mode? yes → approve

[Check 2] In permanent approval whitelist? yes → approve

[Check 3] plan mode?
        yes, read-only tool → approve
        yes, write tool → show plan, do not execute

[Check 4] auto mode?
        yes → classifier evaluates: low risk → approve, high risk → fall back to default

[Check 5] default mode: show interactive confirmation
        Approve (once)   → execute
        Approve (always) → execute + add to permanent whitelist
        Deny             → record in permissionDenials[]

Permanent Approval Whitelist

Storage: ~/.claude/approvals.json
Structure: Map<toolName, Set<serializedInput>>

Example:
{
  "BashTool": ["npm test", "npm run build", "git status"],
  "FileWriteTool": ["/project/src/utils.ts"],
}

Whitelist is exact matchnpm test does not cover npm test --watch.

auto Mode Risk Classifier

Evaluation dimensions:

Tool type
  Read-only tools (Glob, Grep, FileRead)    → low risk
  Write tools (FileWrite, FileEdit)         → medium risk
  Execution tools (BashTool)                → high risk (default)

Parameter content analysis (BashTool only)
  rm -rf ...    → extreme risk
  sudo ...      → high risk
  curl | bash   → extreme risk
  git status    → low risk
  npm test      → low risk

Directory scope: target outside allowedDirectories → risk escalates one level

Permission System Decoupled from QueryEngine

Permission logic is injected via callback — QueryEngine doesn’t know the implementation:

// In QueryEngineConfig:
canUseTool: (tool, input) => Promise<boolean>

// This means:
// In tests, inject a mock that always returns true
// Different environments (CLI, IDE, CI) inject different permission logic

06 · Multi-Agent Coordination — coordinator/

Directory: src/coordinator/ · src/tools/AgentTool/ · src/tools/SendMessageTool/ Responsibilities: Sub-agent spawning, inter-agent communication, lifecycle orchestration

Architecture Overview

Root Agent
  QueryEngine main loop
  mutableMessages (main context)
  totalUsage (aggregates full-tree cost)

        │ calls AgentTool

┌──────────────── coordinator ─────────────────┐
│  AgentRegistry      registry: id → process   │
│  LifecycleManager   create / monitor / reap  │
│  MessageRouter      SendMessageTool routing   │
│  SharedStateProxy   read-only state sharing   │
└──────┬───────────────────────┬───────────────┘
       │                       │
       ▼                       ▼
Sub-agent A (Bun process)  Sub-agent B (Bun process)
Independent QueryEngine    Independent QueryEngine
Independent message history Independent message history
Restricted tool set        Restricted tool set

Why Processes Instead of Threads

Process approach (what Claude Code uses):
  Each sub-agent = independent Bun subprocess
  → Memory naturally isolated, no locking needed
  → Subprocess crash doesn't affect parent
  → OS can set CPU/memory limits per process
  → True parallelism across CPU cores

Tool Set Inheritance & Restriction

Sub-agent tool sets are explicitly injected by the parent, following least-privilege:

Parent creates sub-agent with:
  tools: ['GlobTool', 'GrepTool', 'FileReadTool']

Sub-agent:
  GlobTool ✓   GrepTool ✓   FileReadTool ✓
  BashTool ✗   FileWriteTool ✗   AgentTool ✗

Nesting depth limit: Max 3 levels of nesting. At level 3, AgentTool is forcibly removed to prevent infinite recursive process creation.

Inter-Agent Communication: SendMessageTool

All communication goes through coordinator — agents don’t reference each other directly:

Agent A calls SendMessageTool(to='agent_B_id', message='...')
  → coordinator.MessageRouter receives
  → looks up AgentRegistry, finds agent_B's process handle
  → writes message to agent_B's stdin
  → agent_B's LLM processes, response streams back to coordinator
  → coordinator returns response as tool_result to agent A

Special routing targets:

  • to='__parent__' — send to parent agent
  • to='*' — broadcast to all sibling agents
  • to='agent_xxx_id' — precise point-to-point

Three Execution Patterns

Serial decomposition:

Root → sub-A (scan structure) → sub-B (analyze deps) → sub-C (generate report)

Parallel fan-out:

Root → sub-A (review module 1)  ┐
     → sub-B (review module 2)  ├→ aggregate results
     → sub-C (review module 3)  ┘
Total time ≈ slowest one, not sum of all three

Hierarchical delegation:

Root: "refactor entire codebase"
  → sub-A: "refactor frontend"
      → sub-sub-A1: "refactor components"
      → sub-sub-A2: "refactor styles"
  → sub-B: "refactor backend"

Full-Tree Token & Cost Tracking

/cost shows full-tree cost, not just root agent:

  Total cost: $0.127
    Root agent itself:        $0.031
    Sub-agent A (with tree):  $0.063
    Sub-agent B:              $0.033

Fault Isolation & Resilience

When a sub-agent crashes:
  LifecycleManager detects abnormal exit
  → AgentTool receives failure result
  → Returns error tool_result to root QueryEngine
  → Root LLM decides: retry / degrade / report to user
  → Root agent continues running (completely unaffected)

Heartbeat detection: Every 10 seconds. No response within 5 seconds → declared dead, force cleanup, prevents zombie processes.