Lift superpowers:code-reviewer agent into the requesting-code-review skill#1299
Merged
Conversation
…skill The plugin had a single named agent (`agents/code-reviewer.md`) used by two skills, while every other reviewer/implementer subagent in the repo is dispatched as `general-purpose` with the prompt template living alongside its skill. That asymmetry had no upside and several costs: - Two sources of truth for the code review checklist (the agent file and `requesting-code-review/code-reviewer.md`), both drifting independently. - `Codex` users could not use the named agent directly; the codex-tools reference doc had a workaround section explaining how to flatten the named agent into a `worker` dispatch. - No third-party reliance on `superpowers:code-reviewer` inside this repo. Changes: - Merge `agents/code-reviewer.md` (persona + checklist) and `skills/requesting-code-review/code-reviewer.md` (placeholder template) into a single self-contained Task-dispatch template, matching the shape of `implementer-prompt.md`, `spec-reviewer-prompt.md`, etc. - Update `skills/requesting-code-review/SKILL.md` and `skills/subagent-driven-development/code-quality-reviewer-prompt.md` to dispatch `Task (general-purpose)` instead of the named agent. - Drop the now-obsolete "Named agent dispatch" workaround sections from `codex-tools.md` and `copilot-tools.md` — superpowers no longer ships any named agents, so those instructions documented nothing. - Delete `agents/code-reviewer.md` and the empty `agents/` directory. Tier 3 coverage for the change: a new behavioral test `tests/claude-code/test-requesting-code-review.sh` plants real bugs (SQL injection, plaintext password handling, credential logging) into a tiny project, runs the actual `requesting-code-review` skill against the working tree, and asserts the dispatched reviewer flags every planted issue at Critical/Important severity and refuses to approve the diff. Verified end-to-end on this branch: - The new test passes (5/5 assertions; reviewer caught all planted bugs and several others). - The existing SDD integration test still passes (7/7 subagents dispatched, all as `general-purpose`; spec compliance still rejects extra features; produced code is correct). - Session JSONLs confirm zero remaining `superpowers:code-reviewer` dispatches anywhere in the SDD pipeline.
45975ec to
8d9d82b
Compare
Merged
6 tasks
This was referenced May 10, 2026
2 tasks
This was referenced May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem are you trying to solve?
The plugin had exactly one named agent (
agents/code-reviewer.md, dispatched assuperpowers:code-reviewer) used by two skills, while every other reviewer/implementer subagent in the repo is dispatched asgeneral-purposewith the prompt template living alongside the skill that uses it.That asymmetry caused real friction:
Two sources of truth, drifting independently. Both
agents/code-reviewer.md(persona + checklist) andskills/requesting-code-review/code-reviewer.md(placeholder template) defined how a code review should look. Whenever someone updated one, the other rotted. PR fix(code-review): fix placeholder mismatch in reviewer template #915 is currently open fixing a placeholder-name mismatch ({PLAN_REFERENCE}vs{PLAN_OR_REQUIREMENTS}) that exists only because the template-shape definition is split across two files. My refactor folds the two into one self-contained template; the bug fix(code-review): fix placeholder mismatch in reviewer template #915 is fixing disappears as a side effect.Codex users couldn't dispatch the named agent.
skills/using-superpowers/references/codex-tools.mdhad a "Named agent dispatch" workaround section explaining how Codex users had to flatten the named agent into aworkerdispatch — a section that documented behavior superpowers itself was the only consumer of.PR fix: distinguish Skills from Agent types in subagent-driven-development Integration section #1078 is open trying to disambiguate Skills from Agent types in the SDD skill docs because Claude was getting confused about whether
superpowers:code-reviewerwas a skill or an agent. The named agent's existence is what created that confusion.The thing that prompted the audit: I was re-reading the SDD skill prompt templates and noticed
subagent-driven-development/code-quality-reviewer-prompt.mdsaysTask tool (superpowers:code-reviewer)while every sibling file saysTask tool (general-purpose). I asked "is there a reason for this asymmetry?" and there wasn't one — the named agent was added in v3.2.1 (2025-10-20) so users wouldn't need personal agent config, but the same goal is achieved by inlining the prompt the way every other reviewer in this repo already does.What does this PR change?
Merges the system-prompt content of
agents/code-reviewer.mdintoskills/requesting-code-review/code-reviewer.mdto produce a single, self-contained Task-dispatch template, switches both call sites to dispatchgeneral-purpose, removes the now-vestigial named-agent dispatch instructions from the platform tool-mapping references, and deletes the emptyagents/directory. Adds a Tier 3 behavioral test that verifies the dispatched reviewer actually catches real planted bugs.Is this change appropriate for the core library?
Yes. It's a structural cleanup of core skill infrastructure — every user who runs
requesting-code-revieworsubagent-driven-developmentis affected, and the change reduces complexity and surface area. No third-party dependencies introduced, nothing project-specific or domain-specific.What alternatives did you consider?
Promote the other reviewers to named agents. Make spec-reviewer, plan-reviewer, etc. all into
agents/*.md. Rejected: more moving parts, more YAML metadata to maintain, more cross-platform compatibility headaches (Codex can't dispatch named agents at all without the manual workaround).Leave a stub
agents/code-reviewer.mdthat re-exports the prompt template, for backward-compat with anyone wired againstsuperpowers:code-revieweroutside the repo. Rejected:git grepshows zero external uses inside this repo, and named-agent registries are local to each user's harness install — there's no public API to break here.Just fix the placeholder bug from fix(code-review): fix placeholder mismatch in reviewer template #915 in place. Rejected: that fixes the symptom while leaving the root cause (split template definition) intact. Drift will reintroduce a similar bug in six months.
Does this PR contain multiple unrelated changes?
No. Everything in the diff is in service of a single change: collapse the named code-reviewer agent into the skill that dispatches it, then prove with a Tier 3 test that the lifted template still produces effective reviewers. The SDD prompt-template update, the platform-doc cleanup, and the new behavioral test are all consequences of the lift.
A separate PR (#TBD-this-test-fix-already-merged-to-dev: e795530) fixed three pre-existing bugs in
test-subagent-driven-development-integration.shthat prevented the SDD integration test from running its assertions at all. That PR landed on dev before this one because validating this refactor required a working integration test on the SDD pipeline.Existing PRs
{PLAN_REFERENCE}/{PLAN_OR_REQUIREMENTS}drift bug. This PR supersedes it: the merged template only uses{PLAN_OR_REQUIREMENTS}, so the drift class of bug becomes structurally impossible. fix(code-review): fix placeholder mismatch in reviewer template #915 can be closed once this lands.superpowers:code-reviewerfrom skills. With the named agent gone, the underlying confusion goes away — the doc patches in fix: distinguish Skills from Agent types in subagent-driven-development Integration section #1078 become redundant rather than necessary.Environment tested
claude -p)Evaluation
Initial prompt that started the session: "we currently use a ton of subagents as part of superpowers, but only one of those agent types is an actual agent in 'agents' — i don't think there's a good reason for that? can you study the code base and let me know? would it make sense to lift that agent prompt into the relevant skill as a progressive disclosure doc?"
Eval sessions run after the change:
tests/skill-triggering/run-test.sh requesting-code-reviewagainst the working tree → skill triggered correctly on a naive prompt about reviewing a diff.tests/claude-code/test-requesting-code-review.sh(the new Tier 3 test) against the working tree → 5/5 assertions pass on every run. Each run plants real bugs (SQL injection, plaintext password "hash", credential logging) into a tiny project, dispatches the code reviewer via the actual skill, and asserts the reviewer catches each planted bug at Critical/Important severity and refuses to approve. The reviewer also caught issues I hadn't planted (timing-side-channel inlogin, missing tests, no error handling, return-shape ambiguity), which is the calibration we want.tests/claude-code/test-subagent-driven-development-integration.shagainst the working tree → 11/11 assertions pass. The SDD pipeline still produces working code with the new dispatch type.Before/after delta (measured via session JSONL inspection):
general-purposeand 3 assuperpowers:code-reviewer. Withagents/code-reviewer.mdremoved, those 3 dispatches would either fail or fall back, depending on harness version.general-purpose. Zerosuperpowers:code-reviewerreferences anywhere in the pipeline. Spec compliance still rejects extra features, code is still produced and tested.That's the smoking-gun evidence that the refactor reaches the running code path — not just the skill prose — and that it doesn't break the SDD review loop in the process.
Rigor
superpowers:writing-skillsand completed adversarial pressure testing (paste results below)The skill-content change here is structural (template merge + dispatch type), not behavior-shaping prose. No Red Flags tables, rationalization lists, or "human partner" wording were touched. The new template preserves the persona framing and review checklist from the old
agents/code-reviewer.md, plus the structured-output and severity-categorization rules from the oldcode-reviewer.mdtemplate, deduplicated and tightened.The adversarial test isn't a writing-skills eval — it's a behavioral test: plant deliberate Critical bugs in a diff, dispatch the reviewer via the actual skill, assert the reviewer flags them. That's the right test for this class of change because the change is about dispatch plumbing, not about how the reviewer thinks. If the lifted template produced sycophantic reviewers, the planted-bug test would fail.
Human review
Jesse and I worked through the design and the diff together over the course of the session — including pushing back on my initial take that the existing test suite already had Tier 3 coverage (it did, but it was silently broken; that's what motivated the test infrastructure fix that landed on dev separately as e795530).