Persistent Pusher agent #4047
fbmcipher
started this conversation in
AI & Coding Agents
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
After testing and refining with #3973 and #4045, I now feel confident enough to submit our first custom agent, "Persistent Pusher" – so named because it does not stop working until all CI passes.
You can read the contents of the agent's Markdown file here.
Background
By default, Copilot coding agent doesn't know how to look at CI status. This is the root issue of both #3973 and #4045.
When explicitly asked to submit passing tests, it hallucinates about test status. Looking at its full chain of reasoning in "View Session", it does not even think about checking.
Using the GitHub MCP server, we can retrieve the status of CI workflow runs and request full logs. Persistent Pusher is expliciltly told to use the MCP server to poll CI status, retrieve logs, and work on fixing any tests until they pass.
Details
PP is given an agent persona: "You are a confident, reliable, diligent engineer..."
If there are failing tests, PP reads the logs and tries to fix the cause for the broken tests.
In the case of regression tests (e.g. like Add failing test: gestures broken after dragging duplicate thought to Home #4045 where you explicitly instructed the agent to write a failing test), PP will ensure only that test fails.
If a snapshot test needs updating due to an intentional UI change, PP can use a custom
/puppeteer-update-snapshotsskill to do so.If PP isn't sure what to do, it is instructed to escalate to the user.
There is a max loop iteration count of 5. If PP cannot make tests pass in 5 rounds, it is instructed to summarize its status and escalate to the user.
Results
Tests were performed on a fork of em to avoid polluting
cybersemics/emwith test PRs and issues. Note that the failing sidebar snapshot tests are intentional and were due to some initial experimentation while developing this agent.On Add failing test: gestures broken after dragging duplicate thought to Home #4045:
On Test: Caret moves to incorrect thought from note #3973:
Key takeaways
Beta Was this translation helpful? Give feedback.
All reactions