Improve using-superpowers skill description and conciseness#459
Improve using-superpowers skill description and conciseness#459fernandezbaptiste wants to merge 1 commit intoobra:mainfrom
Conversation
- Expand frontmatter description with capability statement and trigger clauses - Condense EXTREMELY-IMPORTANT block (rule already stated in The Rule section) - Trim redundant Red Flags entries that repeat the same point - Add Output section specifying what the skill produces
|
Can you tell me a little bit about the evals you did? "61% to 100%" Is an interesting number, and some of the changes you made run counter to everything I have seen testing skills. |
|
Absolutely - The review eval is generated with Opus 4.6 and judges the skill against anthropic’s guidelines for skill design (e.g frontmatter specificity, trigger clarity, output sections etc). We also run scenario-based task evals to measure real agent performance across tasks, comparing results with and without your skill to quantify the true perf delta. To do that, you need to claim your skill on tessl and run the eval. If you're curious to poke review evals for say "executing-plans" skill, paste this into Claude - takes ~1 min ro run: |
|
Hi - just following up on the above. |
Hey @obra, thanks for publishing superpowers. Appreciate you sharing your workflos, and kudos on soon hitting
50kstars! I just starred it. Side note, I shared one of your blogs on the newsletter I take care of (perhaps you heard of the ai native dev newsletter?) - kudos for the fantastic content.was running your skills through some evals and noticed a few things on using-superpowers that were pretty quick to improve (moving from
~61%to~100%agent performance):frontmatter description conflicting with other skill since the trigger covers all conversations. expanded it with specific actions
redundancy over "check skills first" in different words with red flags, trimmed to the 5 most distinct patterns
added an output section so the agent knows exactly what to produce after skill resolution.
these were easy changes to bring the skill in line with what performs well against Anthropic's best practices. honest disclosure, I work at a company where we build tooling around this. not a pitch, just fixes that were straightforward to make.
you've got
41skills here, if you want to do it yourself, evals are free and open to run: link otherwise happy to make the improvements for you.