kj run
kj run is the command. Everything else in Karajan supports it (planning, auditing, doctor) or is a subset of it (kj code, kj review, kj scan). Get this one right and you’ve got the system.
What it does
Section titled “What it does”kj run orchestrates a multi-agent pipeline that takes a task description as input and produces working code as output, going through any subset of 24 specialised roles (planner, coder, reviewer, sonar, tester, security…). You pick the AI agent backing each role (Claude, Codex, Gemini, Aider, OpenCode) — the pipeline shape doesn’t change.
The pipeline runs in three phases:
- Pre-loop (once): intent detection → optional planning/research/architecture → skills loading → acceptance test synthesis. Sets up the context the iteration loop will use.
- Iteration loop (1 to
--max-iterationstimes): coder writes code → deterministic guards check the diff → sonar scans → TDD gate verifies tests fail first → reviewer evaluates. If reviewer rejects with structural issues, loop again with the feedback; if it rejects with style-only issues, Solomon (the arbiter) can override. - Post-loop (once, after approval): optional tester, security, performance, impeccable, and audit passes. Optional git commit / push / PR.
The default minimum is intentionally lean: intent → skills → acceptance → coder → guards → sonar → tdd → reviewer → solomon → brain. Everything else is --enable-X opt-in or activated by --mode=paranoid. The reasoning is that adding a role costs tokens and time, and most tasks don’t need researcher or architect every time.
By the end, you get: code written, tests passing (if TDD or --enable-tester), a Sonar quality gate green, a reviewer that approved (possibly after iterations), and optionally a PR open against your base branch.
When to use
Section titled “When to use”- Implementing a feature or fix — the typical case:
kj run "add a logout button to the navbar". - Executing an approved plan —
kj run --plan PLN-001runs every HU in the plan. - Re-running a single HU —
kj run --plan PLN-001 --hu HU-003after a failure, instead of restarting the whole plan. - CI-driven implementation —
kj run "<task>" --yes --max-iterations 5 --auto-prin a GitHub Action that’s reacting to an issue label. - Paranoid mode for sensitive code —
kj run "<task>" --mode paranoidactivates tester + security + perf + impeccable + planner + triage + hu-reviewer in one shot.
When NOT to use
Section titled “When NOT to use”- Doing exploration / one-shot prompts — if you’re just asking “how would you approach X?” use the underlying agent CLI directly.
kj runis heavy: full guards, sonar, reviewer, possibly iteration. - Reading-only analysis — to evaluate code quality without modifying anything, use
kj audit, notkj run. - Running just the coder, no review — use
kj code. Skips reviewer/sonar/iteration. - Just wanting a Sonar scan — use
kj scan. - Tasks you can’t describe well —
kj runworks as well as your task description. Vague tasks produce vague code; fix the task first.
The task argument
Section titled “The task argument”kj run requires a task. Three ways to provide it:
kj run "Fix the login redirect on Safari" # inline stringkj run --task-file ./specs/login-redirect.md # markdown filekj run --plan PLN-001 # execute all HUs in a planFor a Planning Game integration, pass the card ID with --pg-task KJC-TSK-0042 (still requires the task description coming from somewhere — usually the card body).
Options
Section titled “Options”kj run has 64 flags. They group naturally — read just the section that matches what you’re trying to do, not the whole table.
Agent selection
Section titled “Agent selection”Which AI does each role. Defaults come from your karajan.config.yml (roles.<role>.provider).
| Flag | Default | When to flip it |
|---|---|---|
--coder <name> | claude (config) | Use a different agent for this run only: --coder codex. Useful for A/B testing or when Claude’s API is rate-limited. |
--reviewer <name> | cross-provider of coder | Force a specific reviewer: --reviewer gemini. Default is “the other major provider” (claude↔codex) which gives two perspectives. |
--planner <name> | config | Only matters when --enable-planner. |
--refactorer <name> | config | Only matters when --enable-refactorer. |
--coder-fallback <name> | from brain.fallback config | What to switch to when the primary coder hits QUOTA_EXHAUSTED_* with retryAfter > 12h. Useful before known Anthropic limit periods. |
--reviewer-fallback <name> | from config | Same idea for the reviewer. |
--coder-model <name> | tier-driven | Pin a specific model: --coder-model claude-opus-4-7. Bypasses the triage tier picker. |
--reviewer-model <name> | tier-driven | Pin reviewer model. |
--planner-model <name> / --refactorer-model <name> | tier-driven | Pin per-role models. |
--smart-models / --no-smart-models | on if --enable-triage | Lets triage pick a cheaper model (haiku) for trivial tasks. Turn off when you want consistent model choice across the run. |
Pipeline toggles (which roles run)
Section titled “Pipeline toggles (which roles run)”| Flag | Default | When to enable it |
|---|---|---|
--enable-triage | off | Codebase >10k LOC or task ambiguity — triage saves tokens by picking the right tier. |
--enable-discover | off | Codebase >20k LOC and the task touches code already implemented. Discover does the grep/Read pass once, the coder benefits in every iteration. See discover →. |
--enable-researcher | off | Decision-heavy tasks (“add caching” — what flavour?). See researcher →. |
--enable-architect | off | Tasks with non-trivial architecture impact. Pairs naturally with --enable-researcher. |
--enable-planner | off | Long tasks that need to be split into HUs first. Implicit when --plan is passed. |
--enable-hu-reviewer | off | When the plan came from a spec the user wrote — runs review on the plan before coding. |
--enable-refactorer | off | Tasks where you expect the coder to produce code that needs cleanup. Costs an extra LLM round. |
--enable-tester | off | When you don’t fully trust the TDD gate or want the test suite as a post-step. |
--enable-security | off | High-stakes tasks (auth, payment, file uploads). Adds an LLM pass focused on OWASP. |
--enable-perf | off | Frontend tasks where a regression matters. Activates Lighthouse if available. |
--enable-impeccable | off | ”Final polish” pass for code that will be reviewed by senior humans. |
--enable-ci | off | Reacts to PR comments / labels during the run, useful for human-in-the-loop CI. |
--enable-sonarcloud / --no-sonarcloud | off | Use SonarCloud (hosted) alongside or instead of local Sonar. |
--no-sonar | sonar on | Disable Sonar for this run. Useful when SonarQube container is being restarted. |
--no-auto-rebase | auto-rebase on | Don’t rebase the working branch onto base before starting. Use when you’re certain the base is in sync. |
--mode <name> | standard | paranoid enables triage + planner + hu-reviewer + tester + security + perf + impeccable in one go. Use for code that ships to production critical paths. |
--methodology <tdd|standard> | tdd | tdd requires tests to fail before coding; standard lets the coder write tests and code together. Pick based on team practice. |
--auto-simplify / --no-auto-simplify | auto-simplify on | Lets triage downgrade the pipeline for trivial tasks (e.g. skip reviewer for typo fixes). Disable to force the full pipeline. |
--design | off | Lets the impeccable role write design changes (not just suggest them). Use only when the task is a design pass. |
--brain <on|off> | on | The Brain decisor (universal error recovery + smart routing). Only turn off when debugging routing decisions. |
--enable-serena | off | Activates the optional Serena MCP for extra exploration capability. Requires Serena installed. |
Iteration control
Section titled “Iteration control”How long Karajan keeps trying when the reviewer rejects.
| Flag | Default | When to flip it |
|---|---|---|
--max-iterations <n> | 5 | Lower for spike work (--max-iterations 1). Higher for tough refactors where you’re willing to spend more (10+). Higher than 10 rarely converges. |
--max-iteration-minutes <n> | 15 | Per-iteration wall time. Lower for cheap tasks. |
--max-total-minutes <n> | 90 | Whole-run wall time. The run aborts if exceeded regardless of iteration count. |
--reviewer-retries <n> | 2 | How many times to re-invoke the reviewer when it errors (network blip, etc.). |
--checkpoint-interval <n> | 5 (minutes) | How often Karajan pauses for the user’s confirmation in interactive runs. Set higher in CI; ignored when --yes. |
Plan / HU selection
Section titled “Plan / HU selection”| Flag | When |
|---|---|
--plan <planId> | Execute an existing plan (every HU in order). Skips researcher/architect/planner — the plan already has the context. |
--hu <huIds> | Run only specific HUs from a plan: --plan PLN-001 --hu HU-003,HU-005. Requires --plan. The board uses this for the per-HU ▶ button. |
--hu-file <path> | YAML file with HU stories to certify before coding. Manual HU input without going through kj plan. |
--task-type <type> | Override the intent detector: sw | infra | doc | add-tests | refactor. Useful when the inferred type would skip stages you want (e.g. force sw for an infra-looking task). |
Git / CI integration
Section titled “Git / CI integration”| Flag | Default | When to enable |
|---|---|---|
--auto-commit | off | Karajan commits per iteration. Useful with --auto-push for visible CI progress. |
--auto-push | off | Push commits as they’re made. Requires git remote + auth. |
--auto-pr | off | Open a PR against --base-branch when the run completes successfully. CI mode. |
--base-branch <name> | main | Override the PR target. |
--base-ref <ref> | (computed) | The commit ref the new branch is based off. Defaults to origin/<base-branch>. |
--branch-prefix <prefix> | feat/KJC- | Branch naming. |
Planning Game integration
Section titled “Planning Game integration”| Flag | Effect |
|---|---|
--pg-task <cardId> | Link the run to a card; updates progress automatically. Requires --pg-project or config. |
--pg-project <projectId> | Target project for --pg-task. |
Role overrides
Section titled “Role overrides”| Flag | Effect |
|---|---|
--skip-role <role...> | Force a role OFF regardless of triage: --skip-role tester security. |
--force-role <role...> | Force a role ON regardless of triage: --force-role planner. |
--domain <text-or-path> | Inject domain knowledge: inline text or path to .md. Picked up by the domain-curator role. |
--skills-mode <mode> | auto | regex | semantic | none. auto is fine 99% of the time; none for tasks where skills make the prompt too noisy. |
Output
Section titled “Output”| Flag | When |
|---|---|
-y, --yes | Skip all confirmations — CI mode. |
--dry-run | Print what would be executed, don’t actually run. |
--json | JSON output only, no styled console. For programmatic consumption. |
-q, --quiet | Stage status lines only, no raw agent output. Default. |
-v, --verbose | Full agent output (stream-json, raw lines). For debugging. |
Examples
Section titled “Examples”Typical interactive run
Section titled “Typical interactive run”kj run "Implement password-reset email flow"Karajan prompts for confirmation of the cwd, then runs intent → skills → acceptance → coder → guards → sonar → tdd → reviewer (loop). When the reviewer approves, the run ends. You see stage transitions in the terminal; full logs in ~/.karajan/sessions/<id>/.
Paranoid run for security-critical code
Section titled “Paranoid run for security-critical code”kj run "Add JWT refresh-token rotation to /api/auth/*" --mode paranoid --enable-perfAdds triage, planner, hu-reviewer, tester, security, perf, impeccable on top of the default. Doubles the wall time, but every dimension gets scrutiny. The --enable-perf is redundant here (paranoid already enables it) — included for clarity in CI scripts.
CI / automation
Section titled “CI / automation”kj run --task-file .github/tasks/migrate-to-pg17.md \ --yes \ --max-iterations 8 \ --max-total-minutes 180 \ --auto-commit --auto-push --auto-pr \ --base-branch main \ --json > run.jsonNon-interactive, longer iteration budget, opens a PR at the end, machine-readable output. Run inside a GitHub Action; downstream steps read run.json.
Re-running a single failed HU
Section titled “Re-running a single failed HU”kj run --plan PLN-001 --hu HU-005The plan already exists, HU-005 failed in a previous run, you’ve manually fixed whatever caused it. Re-runs just HU-005, leaves the rest of the plan untouched. The HU Board’s per-HU ▶ button uses exactly this.
A/B testing two coders
Section titled “A/B testing two coders”kj run "Refactor src/payments/* for testability" --coder claude --max-iterations 3# inspect result, undo if neededkj undokj run "Refactor src/payments/* for testability" --coder codex --max-iterations 3Useful for picking a default. Pair with kj report to compare token spend and time.
How it works internally
Section titled “How it works internally”kj run lives in src/orchestrator/flow-runner.js, but most of the logic is in drivers under src/orchestrator/drivers/: pre-loop.js, iteration-loop.js, post-loop.js. The reason for this split is that the same drivers power three slightly different flows: the standard single-task pipeline, the --plan plan-execution flow (which loops over HUs), and the analysis-only flow for taskType=audit/doc/infra (which skips the coder loop entirely). Each driver is independently testable.
The iteration loop is the heart of the design. The conventional way to drive an agent would be “ask the AI to write code, ship it.” Karajan instead frames every iteration as a negotiation: the coder proposes, the deterministic guards veto on hard rules (filesystem leaks, credentials), Sonar adds external truth, the TDD gate enforces “tests fail first”, the reviewer evaluates structurally, and Solomon arbitrates if coder and reviewer disagree on style versus structure. The Brain layer (universal error recovery) handles rate limits / quota exhaustion / network issues underneath — invisible until it needs to be visible. This stack is why a 5-iteration limit converges so often: each iteration has multiple independent signals telling the coder what to fix, rather than a single “try again” loop.
The pre-loop and post-loop phases exist to front-load expensive work: research, architecture, planning all happen once with the full task context, not once per iteration. Without this split, a 5-iteration run would re-do the research five times. The opposite extreme — putting everything pre-loop — would mean you couldn’t react to discoveries made during coding. The current shape (context once, code-review-loop, audit once) is the sweet spot.
Related
Section titled “Related”- Pipeline roles — what each of the 24 roles does, when each activates, when it’s overhead.
kj plan— generate the plan that--planwill execute.kj code— coder-only, no review, no iteration. The lightweight cousin.kj review— reviewer-only on an existing diff.kj audit— read-only quality evaluation, no code changes.kj report— post-mortem of a completed run (timings, tokens, decisions).