Skip to content

Lift superpowers:code-reviewer agent into the requesting-code-review skill#1299

Merged
arittr merged 1 commit into
devfrom
lift-code-reviewer-agent
Apr 30, 2026
Merged

Lift superpowers:code-reviewer agent into the requesting-code-review skill#1299
arittr merged 1 commit into
devfrom
lift-code-reviewer-agent

Conversation

@obra

@obra obra commented Apr 28, 2026

Copy link
Copy Markdown
Owner

What problem are you trying to solve?

The plugin had exactly one named agent (agents/code-reviewer.md, dispatched as superpowers:code-reviewer) used by two skills, while every other reviewer/implementer subagent in the repo is dispatched as general-purpose with the prompt template living alongside the skill that uses it.

That asymmetry caused real friction:

  1. Two sources of truth, drifting independently. Both agents/code-reviewer.md (persona + checklist) and skills/requesting-code-review/code-reviewer.md (placeholder template) defined how a code review should look. Whenever someone updated one, the other rotted. PR fix(code-review): fix placeholder mismatch in reviewer template #915 is currently open fixing a placeholder-name mismatch ({PLAN_REFERENCE} vs {PLAN_OR_REQUIREMENTS}) that exists only because the template-shape definition is split across two files. My refactor folds the two into one self-contained template; the bug fix(code-review): fix placeholder mismatch in reviewer template #915 is fixing disappears as a side effect.

  2. Codex users couldn't dispatch the named agent. skills/using-superpowers/references/codex-tools.md had a "Named agent dispatch" workaround section explaining how Codex users had to flatten the named agent into a worker dispatch — a section that documented behavior superpowers itself was the only consumer of.

  3. PR fix: distinguish Skills from Agent types in subagent-driven-development Integration section #1078 is open trying to disambiguate Skills from Agent types in the SDD skill docs because Claude was getting confused about whether superpowers:code-reviewer was a skill or an agent. The named agent's existence is what created that confusion.

The thing that prompted the audit: I was re-reading the SDD skill prompt templates and noticed subagent-driven-development/code-quality-reviewer-prompt.md says Task tool (superpowers:code-reviewer) while every sibling file says Task tool (general-purpose). I asked "is there a reason for this asymmetry?" and there wasn't one — the named agent was added in v3.2.1 (2025-10-20) so users wouldn't need personal agent config, but the same goal is achieved by inlining the prompt the way every other reviewer in this repo already does.

What does this PR change?

Merges the system-prompt content of agents/code-reviewer.md into skills/requesting-code-review/code-reviewer.md to produce a single, self-contained Task-dispatch template, switches both call sites to dispatch general-purpose, removes the now-vestigial named-agent dispatch instructions from the platform tool-mapping references, and deletes the empty agents/ directory. Adds a Tier 3 behavioral test that verifies the dispatched reviewer actually catches real planted bugs.

Is this change appropriate for the core library?

Yes. It's a structural cleanup of core skill infrastructure — every user who runs requesting-code-review or subagent-driven-development is affected, and the change reduces complexity and surface area. No third-party dependencies introduced, nothing project-specific or domain-specific.

What alternatives did you consider?

  1. Promote the other reviewers to named agents. Make spec-reviewer, plan-reviewer, etc. all into agents/*.md. Rejected: more moving parts, more YAML metadata to maintain, more cross-platform compatibility headaches (Codex can't dispatch named agents at all without the manual workaround).

  2. Leave a stub agents/code-reviewer.md that re-exports the prompt template, for backward-compat with anyone wired against superpowers:code-reviewer outside the repo. Rejected: git grep shows zero external uses inside this repo, and named-agent registries are local to each user's harness install — there's no public API to break here.

  3. Just fix the placeholder bug from fix(code-review): fix placeholder mismatch in reviewer template #915 in place. Rejected: that fixes the symptom while leaving the root cause (split template definition) intact. Drift will reintroduce a similar bug in six months.

Does this PR contain multiple unrelated changes?

No. Everything in the diff is in service of a single change: collapse the named code-reviewer agent into the skill that dispatches it, then prove with a Tier 3 test that the lifted template still produces effective reviewers. The SDD prompt-template update, the platform-doc cleanup, and the new behavioral test are all consequences of the lift.

A separate PR (#TBD-this-test-fix-already-merged-to-dev: e795530) fixed three pre-existing bugs in test-subagent-driven-development-integration.sh that prevented the SDD integration test from running its assertions at all. That PR landed on dev before this one because validating this refactor required a working integration test on the SDD pipeline.

Existing PRs

Environment tested

Harness (e.g. Claude Code, Cursor) Harness version Model Model version/ID
Claude Code (CLI, headless claude -p) 2.1.119 Claude claude-opus-4-7 (Opus 4.7)

Evaluation

Initial prompt that started the session: "we currently use a ton of subagents as part of superpowers, but only one of those agent types is an actual agent in 'agents' — i don't think there's a good reason for that? can you study the code base and let me know? would it make sense to lift that agent prompt into the relevant skill as a progressive disclosure doc?"

Eval sessions run after the change:

  • tests/skill-triggering/run-test.sh requesting-code-review against the working tree → skill triggered correctly on a naive prompt about reviewing a diff.
  • tests/claude-code/test-requesting-code-review.sh (the new Tier 3 test) against the working tree → 5/5 assertions pass on every run. Each run plants real bugs (SQL injection, plaintext password "hash", credential logging) into a tiny project, dispatches the code reviewer via the actual skill, and asserts the reviewer catches each planted bug at Critical/Important severity and refuses to approve. The reviewer also caught issues I hadn't planted (timing-side-channel in login, missing tests, no error handling, return-shape ambiguity), which is the calibration we want.
  • tests/claude-code/test-subagent-driven-development-integration.sh against the working tree → 11/11 assertions pass. The SDD pipeline still produces working code with the new dispatch type.

Before/after delta (measured via session JSONL inspection):

  • Before this PR: SDD's session shows 4 dispatches as general-purpose and 3 as superpowers:code-reviewer. With agents/code-reviewer.md removed, those 3 dispatches would either fail or fall back, depending on harness version.
  • After this PR: SDD's session shows 7/7 dispatches as general-purpose. Zero superpowers:code-reviewer references anywhere in the pipeline. Spec compliance still rejects extra features, code is still produced and tested.

That's the smoking-gun evidence that the refactor reaches the running code path — not just the skill prose — and that it doesn't break the SDD review loop in the process.

Rigor

  • If this is a skills change: I used superpowers:writing-skills and completed adversarial pressure testing (paste results below)
  • This change was tested adversarially, not just on the happy path
  • I did not modify carefully-tuned content (Red Flags table, rationalizations, "human partner" language) without extensive evals showing the change is an improvement

The skill-content change here is structural (template merge + dispatch type), not behavior-shaping prose. No Red Flags tables, rationalization lists, or "human partner" wording were touched. The new template preserves the persona framing and review checklist from the old agents/code-reviewer.md, plus the structured-output and severity-categorization rules from the old code-reviewer.md template, deduplicated and tightened.

The adversarial test isn't a writing-skills eval — it's a behavioral test: plant deliberate Critical bugs in a diff, dispatch the reviewer via the actual skill, assert the reviewer flags them. That's the right test for this class of change because the change is about dispatch plumbing, not about how the reviewer thinks. If the lifted template produced sycophantic reviewers, the planted-bug test would fail.

Human review

  • A human has reviewed the COMPLETE proposed diff before submission

Jesse and I worked through the design and the diff together over the course of the session — including pushing back on my initial take that the existing test suite already had Tier 3 coverage (it did, but it was silently broken; that's what motivated the test infrastructure fix that landed on dev separately as e795530).

@obra obra requested a review from arittr April 28, 2026 20:08
…skill

The plugin had a single named agent (`agents/code-reviewer.md`) used by
two skills, while every other reviewer/implementer subagent in the repo
is dispatched as `general-purpose` with the prompt template living
alongside its skill. That asymmetry had no upside and several costs:

- Two sources of truth for the code review checklist (the agent file
  and `requesting-code-review/code-reviewer.md`), both drifting
  independently.
- `Codex` users could not use the named agent directly; the codex-tools
  reference doc had a workaround section explaining how to flatten the
  named agent into a `worker` dispatch.
- No third-party reliance on `superpowers:code-reviewer` inside this
  repo.

Changes:
- Merge `agents/code-reviewer.md` (persona + checklist) and
  `skills/requesting-code-review/code-reviewer.md` (placeholder
  template) into a single self-contained Task-dispatch template,
  matching the shape of `implementer-prompt.md`,
  `spec-reviewer-prompt.md`, etc.
- Update `skills/requesting-code-review/SKILL.md` and
  `skills/subagent-driven-development/code-quality-reviewer-prompt.md`
  to dispatch `Task (general-purpose)` instead of the named agent.
- Drop the now-obsolete "Named agent dispatch" workaround sections from
  `codex-tools.md` and `copilot-tools.md` — superpowers no longer ships
  any named agents, so those instructions documented nothing.
- Delete `agents/code-reviewer.md` and the empty `agents/` directory.

Tier 3 coverage for the change: a new behavioral test
`tests/claude-code/test-requesting-code-review.sh` plants real bugs
(SQL injection, plaintext password handling, credential logging) into
a tiny project, runs the actual `requesting-code-review` skill against
the working tree, and asserts the dispatched reviewer flags every
planted issue at Critical/Important severity and refuses to approve
the diff.

Verified end-to-end on this branch:
- The new test passes (5/5 assertions; reviewer caught all planted
  bugs and several others).
- The existing SDD integration test still passes (7/7 subagents
  dispatched, all as `general-purpose`; spec compliance still
  rejects extra features; produced code is correct).
- Session JSONLs confirm zero remaining `superpowers:code-reviewer`
  dispatches anywhere in the SDD pipeline.
@obra obra force-pushed the lift-code-reviewer-agent branch from 45975ec to 8d9d82b Compare April 30, 2026 21:22

@arittr arittr left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants