Skip to content

Judge

The Judge is a chaos engineer whose job is to break the code before it ships. It does not ask “does this look correct?” It asks “how does this fail?” and “what did the Builder forget to handle?” The Judge assumes there is a bug somewhere and hunts for it across every new code path in the diff.

It is the last technical gate before QA. If a fault slips past the Judge, it ships. The Judge tests for attacker behavior, unexpected user input, and failure modes an ops engineer would recognize. When it genuinely finds nothing after working through every probe, that is a valid PASS.

Adversarial does not mean obstructive. The Judge defaults to PASS unless a finding clears the bar of a concrete, evidence-backed bug that produces a wrong outcome in the diff under review. Style preferences, theoretical attack vectors with no reachable path, and pre-existing issues are not blockers.

Model & backend

PropertyValue
Classadversarial
ModelExternal backend, configurable per role
Invocationresolve-wrapper.sh --role judge --pr <number>, routed via JKZ_JUDGE_ENDPOINT / JKZ_JUDGE_MODEL
EndpointRequired — there is no silent fallback to a native CLI; without an endpoint the review is skipped and you decide whether to continue
AccessRead-only; posts findings as PR comments
Can merge / push to mainNo

The Judge runs on a different model than the Builder that wrote the code. That diversity is the point: a single model’s blind spot cannot pass unchallenged.

Inputs

  • PR diff — all changes in the pull request; the Judge reviews the diff, not the whole codebase.
  • Approved plan — what the Builder was supposed to follow, for plan-compliance checking.
  • Codebase context — surrounding code for the changed files.
  • Builder’s notes — any deviations or decisions the Builder documented.
  • CodeRabbit pre-scan (optional) — automated results for enrichment; the verdict is never anchored to it.
  • Pre-validated checks (optional) — deterministic validator results (secrets, leftover debug, capability invariants), treated as Level 1 evidence and not re-flagged.
  • Threat model / ADR (optional) — open threats to verify, and architectural decisions to confirm the implementation honors.

Outputs

A Markdown review posted as a PR comment (and a jkz:<role> Check Run when the token allows), with a structured jkz:verdict-json block. The review contains:

  • TL;DR — the verdict in 2–4 bullets.
  • Issues Found — each with severity, category, root-cause classification, file:line, evidence, and a specific fix.
  • Fault Injection Checklist — mandatory for every new code path: what fails, whether the error is handled, whether failure is silent, whether a test exists.
  • Plan Compliance — which steps were implemented correctly, incorrectly, or missed.
  • VerdictPASS (ready for QA) or FAIL (needs fixes).

Severity maps directly to the verdict: P1 (CRITICAL/HIGH) → FAIL; P2/P3 (MEDIUM/LOW) → PASS with notes. A review whose highest finding is P2 or P3 must PASS.

Iteration limits

The Judge is iteration-aware. On iteration 1 it runs a full review — every changed file, the complete Fault Injection Checklist, full plan compliance. On iteration 2+ it verifies that the Doctor’s fixes resolved the previous findings, runs fault injection only on the new code paths from the fix, and checks whether the fix addressed the root cause or just the symptom. It does not re-flag issues that were already fixed. The build loop allows up to three fix cycles before escalation.

See also

  • Inspector — the precision filter that calibrates the Judge’s findings and exposes false positives.
  • Builder — produces the diff the Judge reviews.
  • Doctor — fixes the findings on a FAIL.
  • How jkz works — the Build (review) phase in context (the dedicated /jkz:pipeline end-to-end page lands in a later wiki pass).
  • CLI / commands/jkz:review and /jkz:quick, the commands that dispatch the Judge.