When we explain Triple Context to engineers we usually have to peel back the marketing layer. The phrase sells well; the mechanism is more interesting than the phrase. Here it is in 1500 words.
The three contexts
Before any reasoning step touches a PR, codingassist.bot builds a ReviewContext from three independent retrievals:
- Diff — the parsed AST diff (not a text diff). Hunks, renames, symbol moves, signature changes. Determinism budget: 5s.
- CodeGraph — vector retrieval over the rest of the repo, ranked by structural relevance to the symbols touched in the diff. Determinism budget: 8s.
- Ticket — the linked ticket, parsed for acceptance criteria and acceptance-criteria-shaped paragraphs. Determinism budget: 2s.
These three retrievals are pure functions of the input. Same SHA, same ticket id → same ReviewContext. The bundle is content-addressed by SHA-256 of its serialised form; that's the cache key for everything downstream.
Why fuse before reasoning
The naive shape — "ask the LLM, give it the diff, then add context if the answer is bad" — looks like it would work and doesn't. The reasoning model anchors on whatever you sent first. Late-binding context tends to confirm rather than redirect.
What we want is for the reasoning step to never see the diff in isolation. It only ever receives a ReviewContext that has all three contexts pre-bound. The temptation to confabulate "what is this code probably doing" is gone, because the structural neighbours are already in the prompt.
What the bundle looks like
type ReviewContext = {
hash: string; // sha256 of canonical serialisation
diff: ParsedDiff;
graph: GraphNeighbours; // ranked, with scores
ticket: ParsedTicket;
meta: { repo: string; sha: string; ticket_id: string; built_at: string };
};This is the only thing the six reasoning planes see. They cannot fetch anything else. They cannot ask follow-up questions. The plane runs against a frozen, hashed bundle and emits a typed signal. The orchestrator sees six signals + the bundle and produces a verdict.
That's the whole architecture of Stage 2.
What this eliminates
The class of bugs we stopped seeing once Triple Context shipped:
- Plausible-but-wrong reviews. The LLM used to "understand" what a function did from its name and confabulate. With the call graph in the bundle, it stops guessing.
- Drift between description and diff. The ticket is right there. The reviewer can no longer drift away from what was asked.
- "Did anyone test this?" questions. Test files in the graph come along for the ride. The plane sees them or asks a precise question.
The class of bugs we did not stop seeing — and which is honestly the next research project:
- Bugs that depend on configuration the codebase doesn't reference statically. Feature flags read from environment, A/B test treatments allocated at runtime. The graph can't see these. The trace can.
Determinism handshake
A note on what guarantees this provides. ReviewContext is deterministic given the same retrievals. The retrievals are deterministic given the same indexes. The indexes drift as you push to main.
So: identical PR + identical main HEAD → identical ReviewContext → identical verdict. That's the strongest claim we make. The trace is replayable as long as the indexes are pinned.
In practice we keep an index snapshot per verdict for 30 days. Past that, you can ask for a fresh re-eval (likely identical, mathematically not guaranteed).
What's next
The next ratchet on Triple Context is Quadruple Context: add the previous N PRs by the same author against the same area as a fourth retrieval. Early data says it picks up another ~6% of intent-drift bugs. We'll write about the mechanism when it's stable.