Skip to content

Pipeline roles

kj run is a pipeline of roles. A role is a logical task (“review the diff”, “scan with Sonar”, “decide arbitrage between coder and reviewer”) backed by either an AI agent, a deterministic check, or a managed subprocess.

This page documents every role with the same shape:

  • Default — is it on, opt-in (--enable-X), or config-driven.
  • Phase — pre-loop (runs once before iteration starts) / iteration (runs every loop) / post-loop (runs once after approval).
  • What it does, Why it exists, When it activates, When it pays off, When it doesn’t, Example.

Use the sidebar TOC on the right (or Ctrl-F) to jump to a role. The roles below are grouped by phase.


These run once at the start of kj run. Their job is to set up the context the iteration loop will consume.

Default: on (always runs). Phase: pre-loop.

Reads the task description and infers the task type: sw (software change), infra (devops/CI/config), doc (documentation), add-tests (test-only change), refactor, or audit (read-only analysis). The inferred type drives downstream decisions: which roles auto-activate, whether the coder loop runs at all, what stages can be skipped.

Without intent, every task would go through the full software-change pipeline (coder + sonar + reviewer + iteration). For a doc change or a Python config tweak, that’s wasted tokens and time. Intent lets Karajan pick the right shape automatically.

Always, first thing in kj run. The detection is deterministic (keyword + structural heuristics on the task text), not LLM-based — so it’s free.

  • Doc-only tasks (“Update README’s install section”) — intent classifies as doc, sonar/tdd/reviewer skip, you save 80% of the pipeline.
  • Test-only tasks (“Add coverage for the auth middleware”) — intent classifies as add-tests, the coder is prompted differently.

Never costs anything. If you disagree with its classification, override with --task-type sw (or whichever).

Terminal window
kj run "Add a section about troubleshooting to docs/getting-started.md"
# Intent → "doc" → skips coder loop, runs only writer + reviewer on doc style.

Default: off — --enable-triage (auto-on under --mode paranoid). Phase: pre-loop.

Classifies the task complexity (trivial / simple / medium / complex) and picks a model tier for each role: haiku for trivial, sonnet for medium, opus for complex. With --smart-models (default when triage is on), expensive roles use cheaper models when the task allows.

Most tasks are not complex. Using Opus/GPT-4o-class models for “fix a typo” is overpaying. Triage lets Karajan match the model to the task.

Only with --enable-triage or --mode paranoid.

  • Codebases with mixed task complexity. The cost savings on the long tail of trivial tasks compound.
  • CI runs where you’re billed per token.
  • Tasks where you’ve already pinned the model with --coder-model. Triage is bypassed.
  • Single-task runs where the overhead of running triage (one LLM call) doesn’t amortise.
Terminal window
kj run "Add an alias for --enable-tester to be -t" --enable-triage
# Triage → "trivial" → coder uses haiku, saves ~80% in coder tokens.

Default: off — --enable-discover. Phase: pre-loop.

Searches the codebase for code patterns, files, and modules related to the task. Produces a structured summary (paths, signatures, related tests) that gets injected into the coder’s prompt as pre-resolved context.

Modern coder agents (Claude Code, Codex CLI) do their own discovery internally on every iteration — they grep, glob, Read. That’s expensive when the iteration loop runs 3-5 times. Discover does it once, the result feeds every subsequent iteration for free.

Only with --enable-discover.

  • Codebases >20k LOC where exploration costs tokens.
  • Tasks touching already-implemented code (refactors, extensions, bug fixes in old features).
  • Multi-iteration runs (--max-iterations > 1).
  • Greenfield projects — nothing to discover.
  • Single-file tasks.
  • --max-iterations 1 runs — the savings don’t amortise.
Terminal window
kj run "Replace bcrypt with argon2 in src/auth/*" --enable-discover
# Discover finds: src/auth/hash.js, src/auth/verify.js, 3 tests, package.json has bcrypt@5.
# Coder receives this context in iter 1 and every iter after.

Default: off — --enable-researcher. Phase: pre-loop.

Investigates external information: library options, design patterns, best practices for the technology stack detected. Produces a research brief that informs the architect/coder.

Decisions like “what caching strategy?” or “which JWT library?” benefit from a single, focused research pass rather than the coder making the call mid-iteration.

Only with --enable-researcher.

  • Tasks where the tech is undecided (“Add caching” — Redis? In-memory? CDN?).
  • Tasks introducing new dependencies.
  • When paired with --enable-architect — gives architect concrete options to choose from.
  • Bug fixes (the fix is the fix).
  • Tasks with prescribed tech (“Use Redis”, “Use jsonwebtoken”).
  • Mechanical work (rename, move, format).
Terminal window
kj run "Implement rate limiting on /api/login" --enable-researcher
# Researcher brief: leaky bucket vs token bucket, in-memory vs Redis, npm libs (rate-limiter-flexible vs express-rate-limit).
# Coder picks one with context.

Default: off — --enable-architect. Phase: pre-loop (after researcher). Pairs with: researcher.

Designs the solution shape before the coder writes anything: data model changes, API contracts, module boundaries, dependency graph. Output is a markdown design doc that goes into the coder’s prompt.

For non-trivial changes, having the architecture decided upfront prevents the coder from getting stuck in local optima (“I’ll just add another field to this table” instead of normalising properly).

Only with --enable-architect.

  • New features touching ≥3 modules.
  • Changes with database schema or API contract impact.
  • When the task description is high-level (“add a permissions system”) — architect turns it into concrete decisions.
  • Tasks ≤1 file.
  • When you’ve already designed it yourself (pass the design via --task-file or --domain instead).
Terminal window
kj run "Add organization-level permissions to the user model" \
--enable-researcher --enable-architect
# Architect produces a design covering: schema changes, middleware order, migration strategy, API breakage.

Default: off — --enable-planner. Auto-on when --plan <id> is given (the plan was already generated). Phase: pre-loop.

Decomposes the task into HUs (Historias de Usuario — user stories), each with acceptance_criteria, dependencies, and complexity points. Output is stored as a plan; the coder loop then runs each HU.

Karajan’s iteration loop converges best on focused tasks. A vague task like “implement a CMS” doesn’t converge — every iteration produces something different. Planner forces the task into atomic HUs first.

  • Explicit: --enable-planner.
  • Implicit: --plan <id> (loads an already-generated plan).
  • Tasks with ≥3 distinct deliverables.
  • Tasks where you want to track per-HU progress (HU Board).
  • When the task description was written by a non-developer (spec → HUs translates intent into actionable units).
  • Single-deliverable tasks.
  • Plan already exists — use --plan <id> instead.
  • Spike work where you don’t yet know what to plan.
Terminal window
kj run "Implement OAuth login with Google + GitHub" --enable-planner
# Planner generates: HU-001 (Google), HU-002 (GitHub), HU-003 (session callback), HU-004 (logout).

Default: off — --enable-hu-reviewer. Phase: pre-loop (after planner or when loading --plan).

Reviews the plan (not the code) before any coder runs. Detects six classes of antipatterns: dependency cycles, scope creep (HU touches more than it should), missing acceptance_criteria, async-observer dependencies, orphan references, and structural inconsistencies. Self-fix loop: if findings, re-invoke planner with structured feedback, up to 5 iterations.

A bad plan produces bad code 100% of the time. Catching plan issues before the coder spends tokens is cheap; catching them after is expensive.

Only with --enable-hu-reviewer.

  • Plans from --task-file written by humans.
  • Plans for codebases with strict architecture rules (impeccable / layered).
  • Always, on plans with ≥5 HUs — at that size, dependency mistakes are common.
  • Single-HU “plans”.
  • Plans you’ve already reviewed manually.
Terminal window
kj run --task-file spec.md --enable-planner --enable-hu-reviewer
# hu-reviewer finds: HU-003 depends on HU-005 which doesn't exist (typo); HU-007 scope creep.
# Re-invokes planner with feedback; second iteration clean.

Default: on (always runs). Phase: pre-loop.

Detects the project’s tech stack (Astro, Lit, Vitest, Playwright, Express, FastAPI, Django, …) and loads the corresponding skills: focused expertise modules with patterns, conventions, and gotchas for each tech. The skill content is injected into the coder/reviewer prompts.

A generic coder prompt produces generic code. With Astro-specific skills loaded, the coder knows about client:idle, content collections, and Astro’s hydration model.

Always. Mode controlled by --skills-mode <auto|regex|semantic|none>.

Every run on a non-trivial stack.

  • --skills-mode none for tasks where skills make the prompt noisy (e.g. a pure-JSON config change).
  • Greenfield projects with no detectable stack.
Terminal window
kj run "Convert the home page to islands architecture"
# Skills detect: astro, lit, vite. Loads astro-islands.md, lit-elements.md.
# Coder produces idiomatic Astro/Lit code instead of plain JS.

Default: off — config: domain.curator: true or --domain <text-or-path>. Phase: pre-loop.

Injects project-specific domain knowledge into the coder’s context: ADRs from ~/.karajan/domains/<project>/, project conventions, business rules. This is project knowledge, not technical skills.

The coder doesn’t know your business rules (“orders can only be cancelled within 24h”), your ADRs (“we settled on event sourcing for orders, not CRUD”), or your conventions (“we prefix all API routes with /api/v2”). Domain curator gives it that context.

When config.domain.curator: true, or when --domain <text-or-path> is given for the run.

  • Codebases with non-obvious business rules.
  • Projects with documented ADRs that the coder should respect.
  • Greenfield projects with no domain yet.
  • Tasks unrelated to business logic (purely technical).
Terminal window
kj run "Add cancel-order endpoint" --domain ~/.karajan/domains/shop/order-rules.md
# Coder respects the 24h-cancellation rule found in order-rules.md.

Default: on (when the task has acceptance_criteria). Phase: pre-loop.

Synthesises acceptance tests from the structured acceptance_criteria of a task or HU. If criteria are in Gherkin (Given/When/Then), the tests are Playwright/Vitest skeletons that match each scenario.

Acceptance criteria written for humans aren’t directly testable. Translating them to tests up front lets the TDD gate verify the coder met the spec, not just made the tests pass.

Always, when the task has acceptance_criteria. No-op when criteria are empty.

  • Tasks with structured criteria (from kj plan or kj run --task-file with Given/When/Then).
  • TDD methodology runs.
  • Tasks with no acceptance criteria (free-form one-liners).
  • --methodology standard runs where TDD gate is off.
Terminal window
# task.md contains:
# acceptance_criteria:
# - given: a user with role=admin
# when: they DELETE /api/users/:id
# then: the response is 204 and the user row is soft-deleted
kj run --task-file task.md
# acceptance role generates: tests/api/users-delete.test.js with a failing spec matching the criterion.

These run on every iteration (up to --max-iterations). One iteration = one cycle of coder → checks → reviewer.

Default: on (always runs when there’s code work). Phase: iteration.

Writes / modifies code to implement the task. Uses the agent specified by --coder (default Claude). Sees: task description, accumulated feedback from previous iterations, discover/researcher/architect context if those ran, skills, domain, acceptance tests (if TDD).

Self-evident — it’s the role that actually writes code. The interesting design decisions are around what it sees as context.

Every iteration of kj run, except in analysis-only flows (taskType=audit/doc/infra where applicable).

Always — it’s the work.

  • Analysis-only runs (intent classifies the task as not requiring code changes).
  • You want only review/audit — use kj review / kj audit instead.
Terminal window
kj run "Add input validation to POST /api/users"
# Coder reads existing src/routes/users.js, modifies it to add zod schema validation, writes a test.

Default: off — --enable-refactorer. Phase: iteration (after coder).

Cleanup pass on the coder’s output: extract long methods, deduplicate, rename for clarity. Doesn’t change behaviour — tests should still pass after refactorer.

The coder is optimised for “produce working code”. Refactorer is optimised for “produce clean working code”. Separating them lets each role be focused.

Only with --enable-refactorer. Costs one extra LLM call per iteration.

  • Tasks that you expect the coder to “make work but ugly” (complex algorithms, awkward integrations).
  • Code that will see human review.
  • Simple tasks where the coder’s output is already clean.
  • Cost-sensitive runs.
Terminal window
kj run "Implement merge-sort with type hints" --enable-refactorer
# Coder produces a working but procedural impl; refactorer pulls helpers, adds JSDoc, simplifies edge cases.

Default: on (always runs). Phase: iteration (after coder, deterministic).

Scans the coder’s diff for 15 credential patterns (AWS keys, GitHub/npm/PyPI/Slack tokens, JWTs, generic secrets, private keys), filesystem leaks (rm -rf on host paths, modifications to .env / serviceAccountKey.json), and destructive operations. Blocking on critical by default.

Without it, a coder hallucination (“I’ll just rm -rf ~”) or a copy-paste of a real key into a test fixture goes to production. Output-guard is the deterministic backstop the LLM can’t talk its way out of.

Always. Deterministic, no LLM cost.

Every run.

Never — always cheap. If you’re hitting false positives, configure guards.output.protected_files and guards.output.patterns in kj.config.yml.

Terminal window
# Coder hallucinates: "to test I'll commit a test API key to .env"
# Output-guard sees AWS_ACCESS_KEY_ID=AKIA... in the diff → blocks, sends feedback to coder.

Default: on (advisory, configurable to block). Phase: iteration (after coder, deterministic).

Detects frontend performance antipatterns in the diff: <img> without width/height/loading=lazy, render-blocking scripts, missing font-display: swap, document.write, heavy deps (moment, lodash, jquery as global imports).

These are well-known regressions you don’t want a coder to introduce silently. Catching them at iteration-time prevents the perf role (post-loop) from finding the same thing 4 iterations later.

Always. Deterministic.

Frontend projects (auto-detected from stack).

Backend-only projects — the patterns won’t match anyway.

Terminal window
# Coder adds <img src="hero.jpg"> without dimensions
# Perf-guard flags it advisory → feedback to coder → next iter has width/height.

Default: on if SonarQube is reachable. Phase: iteration (after coder, between guards and reviewer).

Runs a SonarQube scan on the changed files, fetches findings (filtered through the audit FP filter), and injects critical / major issues into the reviewer’s context. The reviewer then evaluates the diff and the Sonar findings together.

Sonar catches rules the LLM doesn’t reliably enforce: cognitive complexity thresholds, cyclomatic complexity, exact code smells (S3776, S1192, …), specific security hotspots. The combination “LLM + Sonar” catches more than either alone.

Always when SonarQube is running locally. --no-sonar disables. --enable-sonarcloud adds SonarCloud as a complement.

  • Codebases that already enforce Sonar rules.
  • Refactors where complexity / duplication matter.
  • New / greenfield projects without a Sonar project key yet.
  • Sonar is unavailable (container stopped) — Karajan auto-skips with a warning.
Terminal window
# Coder adds a 30-line method with cognitive-complexity=22
# Sonar flags S3776 critical → reviewer sees it in feedback → loop with "refactor for cog-complexity ≤15".

Default: on when --methodology=tdd (default methodology). Phase: iteration (after sonar, before reviewer).

Fail-fast gate: verifies that acceptance tests for this task/HU already exist and are currently failing (because the implementation didn’t exist before the coder ran). If tests pass before the coder’s work, that’s suspicious (no real test for the new functionality). If tests don’t exist, fail-fast to Solomon.

Without TDD gate, a coder can write code that doesn’t actually map to tests. The gate enforces “tests come first” structurally.

When config.development.methodology=tdd (default) or --methodology tdd. Disabled with --methodology standard.

  • Teams already practising TDD.
  • High-trust scenarios where tests are the contract.
  • Spike code where tests don’t exist yet by design.
  • Refactors of code without tests (use --methodology standard for those).
Terminal window
# Acceptance role generated tests/api/cancel-order.test.js (failing).
# Coder implements cancel-order.
# TDD gate: tests now pass → ok. If they still fail → loop with feedback "your impl doesn't satisfy the spec".

Default: on (always runs). Phase: iteration (last step).

Reads the diff and evaluates it against the task / acceptance criteria. Returns either approved (loop ends) or rejected with structured feedback. By default the reviewer uses the cross-provider of the coder (claude↔codex) so two different LLM perspectives evaluate the work.

The fundamental quality gate. Without a reviewer, the coder is its own judge — that converges on local optima, not “what the user actually asked for”.

Every iteration.

Always.

  • kj code (coder-only, no reviewer) — when you trust the coder and want speed.
  • --mode trivial or --auto-simplify may skip it for one-line tasks.
Terminal window
# Reviewer reads the diff, compares to "add input validation".
# Approves: "Validation added, all required fields covered. Test exercises happy + error paths."

Default: on (always available, fires only when needed). Phase: iteration (after reviewer rejection).

The arbiter. When the reviewer rejects, Solomon classifies the rejection into: structural (real bug, design issue), style-only (formatting, naming, no impact), or mixed. For pure style-only rejections, Solomon can override and approve the iteration — preventing infinite loops where coder and reviewer disagree on tabs vs spaces.

Reviewer rejections aren’t all equal. A “your method name should be more descriptive” rejection doesn’t justify another full iteration. Solomon prevents iteration loops on cosmetic disagreements.

Only when the reviewer rejects. Doesn’t fire if reviewer approves.

  • Every run where coder and reviewer have different style preferences (cross-provider review).
  • Long-iteration runs where you want to converge.

Never overhead — only fires when needed.

Terminal window
# Reviewer rejects: "rename `handleErr` to `handleError`".
# Solomon classifies: style-only → overrides, approves the iteration.
# Saves one full coder+reviewer iteration.

Default: on — --brain off to disable. Phase: iteration (wraps every agent call).

The universal error recovery layer. Every LLM call goes through Brain. When an agent fails (rate limit, network timeout, quota exhausted, silenced response), Brain classifies the error and decides: retry now (transient), standby (wait minutes, in-process), hibernate (persist + exit, resume at cooldown), or fallback (switch provider). Configurable per-role fallback chain.

Without Brain, every transient error abort the run. With Brain, runs survive Anthropic rate limits, OpenAI 5xx, network blips. Critical for long pipelines and CI.

Every agent call, transparently. Visible in logs as brain stage events.

  • Multi-hour runs that span rate-limit windows.
  • CI on flaky networks.
  • Runs around the Anthropic $200/month Agent SDK cap — Brain switches to Codex when Anthropic exhausts.
  • Debugging routing decisions — --brain off to see raw provider errors.
Terminal window
# Iteration 3: Claude returns 429 with retry-after 45s.
# Brain: classify=RATE_LIMIT_SHORT → standby 45s → retry → iteration continues seamlessly.

These run once, after the iteration loop approves. They add extra layers of quality verification on top.

Default: off — --enable-tester. Phase: post-loop.

Executes the project’s test suite (Vitest, Jest, Playwright, pytest — auto-detected) against the final state of the code. Reports pass/fail, coverage delta if available.

The TDD gate verifies tests pass at iteration time. Tester verifies they still pass on the final state, including any test the coder didn’t touch. Catches regressions in unrelated areas.

Only with --enable-tester.

  • Codebases with mature test suites.
  • Refactors where the coder might break unrelated tests.
  • Projects without tests.
  • When you trust the TDD gate.
Terminal window
kj run "Refactor src/utils/date.js for readability" --enable-tester
# Tester runs full suite: 4872/4872 pass. Confidence the refactor didn't break anything.

Default: off — --enable-security. Phase: post-loop.

LLM-driven security review of the final diff: OWASP Top 10 (injection, broken auth, XSS, CSRF, …), insecure crypto, token handling, file uploads, deserialisation. Complements Semgrep (deterministic SAST) with reasoning-based analysis.

Some vulnerabilities require reasoning about flow (“this user input reaches this DB query”). Semgrep matches patterns; security role connects them.

Only with --enable-security or --mode paranoid.

  • Auth, payment, file-upload, API gateway changes.
  • Code that handles untrusted input.
  • Internal-only utility changes with no input boundary.
  • Cost-sensitive runs.
Terminal window
kj run "Add file upload for user avatars" --enable-security
# Security role flags: missing MIME type validation, no max file size, path traversal risk in filename.

Default: off — --enable-perf. Phase: post-loop.

LLM pass focused on performance: algorithm complexity, render-blocking resources, bundle size impact, N+1 queries. Activates Lighthouse automatically if the stack is frontend and lighthouse is available.

Performance is hard to verify deterministically — needs reasoning (“this loop is O(n²) where O(n) is achievable”). Perf-guard catches the obvious patterns; perf role catches the structural ones.

Only with --enable-perf or --mode paranoid.

  • Frontend changes affecting user-perceived speed.
  • Backend changes touching hot paths.
  • Doc / config / non-functional changes.
Terminal window
kj run "Add product search to /products page" --enable-perf
# Perf flags: search runs unindexed query (suggests adding index), client bundle includes all 12k products on initial load.

Default: off — --enable-impeccable. Phase: post-loop.

“Final polish” pass — design audit for accessibility, performance, theming, responsive, anti-patterns. Includes WebPerf Quality Gate (Core Web Vitals via Chrome DevTools MCP). By default read-only (flags issues); --design lets it apply fixes.

UI work has many small “is this right?” decisions. Impeccable systematises them. Useful before promoting a UI change to design review.

Only with --enable-impeccable or --mode paranoid.

  • Frontend changes going to design review / senior approval.
  • High-stakes UI (landing pages, conversion flows).
  • Backend changes.
  • Throwaway / spike UI.
Terminal window
kj run "Add hero section to landing page" --enable-impeccable
# Impeccable: 'h1 contrast 3.2:1 fails WCAG AA; image lacks alt; CLS likely >0.1 due to font-display'.

Default: off — --enable-audit (also available as kj audit standalone). Phase: post-loop.

Runs kj audit integrated as a final stage: deterministic collectors (Sonar / OSV / Semgrep / madge / knip) + LLM dimension evaluation (security, codeQuality, performance, architecture, testing, accessibility). Loops the coder back to fix critical / high findings if any are found.

Reviewer signs off on the diff; audit signs off on the resulting state. Different lenses.

Only with --enable-audit.

  • High-stakes runs going straight to production.
  • Long-iteration runs where you want a final consolidated quality report.
  • Run-of-the-mill changes where reviewer approval is enough.
  • When you’ll run kj audit manually later anyway.
Terminal window
kj run "Implement payment refund flow" --enable-audit
# Audit: 0 critical, 1 high (idempotency key not enforced) → loops coder back → second pass clean.

  • Each role’s behaviour, when activated and when not, with reasoning — that’s this page.
  • Each flag of kj run (including --enable-X for each role here) — see kj run.
  • The internal architecture (driver modules, how iteration is wired) — see kj run → How it works internally.
  • What audit dimensions and external collectors do — see Audit dimensions and External tools.