User types a message or pipes input through stdin
● claude-code
$ Find all TODO comments in src/▊
Keyboard input comes from Ink's TextInput component. In non-interactive mode, it reads from piped stdin instead.
Text gets parsed through processUserInput() — 18 parameters, 3 dispatch paths
const result = processUserInput({
text: "Find all TODO comments in src/",
images: [],
abortController,
options: { }
})
Slash commands get intercepted here. Images are resized. The output is always a standard UserMessage object.
Message gets pushed onto the in-memory conversation array
[user] Set up the project structure
[assistant] I'll create the directory layout...
[user] Now add the database models
[user] Find all TODO comments... NEW
The conversation history is just an array that grows over the session. It's what the context window manager trims later.
Six-layer sandwich: 15K–34K tokens assembled every turn
🔒 Enterprise Policy ~2K
⚡ DYNAMIC_BOUNDARY — cache split —
📄 CLAUDE.md + Memory ~3K
🔧 Tool Descriptions (40) ~8K
📋 Base System Prompt ~6K
🧬 Immutable Core ~2K
Everything below DYNAMIC_BOUNDARY gets Prompt Cached (saves 10x cost). Everything above changes per turn.
Stream to Claude API through the Anthropic SDK (server-sent events)
Your Machine
→ SSE →
● Claude API
data: {"type":"content_block_delta","delta":{"text":"I'll"}}
data: {"type":"content_block_delta","delta":{"text":" search"}}
data: {"type":"content_block_delta","delta":{"text":" for"}}
data: {"type":"content_block_delta","delta":{"text":" TODOs"}}
data: {"type":"content_block_delta","delta":{"text":" in"}}
Uses the SDK's streaming interface. Tokens arrive over SSE and get rendered as they land.
Four token tiers: input → cache_write → cache_read → output
Real-time cost tracking. When the running total hits softLimit, compression kicks in. hardLimit forces it.
If tool_use blocks found: findToolByName() → canUseTool() → execute
I'll search for TODO comments...
Let me use the bash tool to find them.
tool_use:
name: "bash"
input: "grep -r TODO src/"
Permission check: allowed (pre-approved pattern)
The response can contain tool calls. Each gets resolved, permission-checked, and run. Multiple tools can execute in parallel.
6 exit conditions checked: end_turn | abort | token limit | error | max turns | tool stop
Has tool results to return?
✓ YES → back to step 5
append tool_result, call API again
✗ NO → continue to step 9
end_turn or stop_sequence hit
↺ This is why it's called a "loop"
Most conversations loop 2-5 times. Complex tasks with many file edits can loop 20+ times before end_turn.
Ink's React reconciler maps AI output to the terminal: vDOM diff → Yoga layout → ANSI
React.createElement → reconciler
→ yogaLayout()
→ renderToString()
→ diffOutput()
→ process.stdout.write()
Found 16 TODO comments across the codebase:
src/query.ts:42 — Add retry logic
src/tools.ts:108 — Validate input
Streaming renders incrementally — text appears as tokens arrive, not after the full response.
Auto-compact if conversation is too long, extract memories, run dream mode
TODO Summary
Found **16 TODO comments** across the codebase:
- src/query.ts:42 — Add retry logic
- src/tools.ts:108 — Validate input
- src/context.ts:15 — Cache system prompt
After the response is done: trim the conversation if it's getting long, extract anything worth remembering, and optionally run dream mode.
Back to the REPL, waiting for the next message
● claude-code
$ ▊
The loop waits for your next message
The loop idles until user input. Ctrl+C is handled gracefully without losing conversation history.