Persistent Pusher agent #4047

fbmcipher · 2026-04-07T02:26:46Z

fbmcipher
Apr 7, 2026
Maintainer

After testing and refining with #3973 and #4045, I now feel confident enough to submit our first custom agent, "Persistent Pusher" – so named because it does not stop working until all CI passes.

You can read the contents of the agent's Markdown file here.

Background

By default, Copilot coding agent doesn't know how to look at CI status. This is the root issue of both #3973 and #4045.

When explicitly asked to submit passing tests, it hallucinates about test status. Looking at its full chain of reasoning in "View Session", it does not even think about checking.

Using the GitHub MCP server, we can retrieve the status of CI workflow runs and request full logs. Persistent Pusher is expliciltly told to use the MCP server to poll CI status, retrieve logs, and work on fixing any tests until they pass.

MCP, or Model Context Protocol, is a means to provide agents access to functions that they can call to retrieve information or perform actions. It's basically an RPC.

Details

PP is given an agent persona: "You are a confident, reliable, diligent engineer..."
- Though the idea of giving an agent a persona and personality traits might initially feel hokey, it actually noticeably influences performance.
- This paper from Anthropic released last week showed that emotion-related representations like "confident", "sad", "scared" all activate internal vectors which shift decision-making, preferences and outputs.
- In theory, describing the agent as "diligent" and "confident" would reduce a model's tendency to "reward hack" due to "desperation" vectors – where, under pressure from repeated failures, the agent starts gaming CI checks instead of genuinely fixing underlying issues.
- Worth reading the paper if you're interested!
If there are failing tests, PP reads the logs and tries to fix the cause for the broken tests.
In the case of regression tests (e.g. like Add failing test: gestures broken after dragging duplicate thought to Home #4045 where you explicitly instructed the agent to write a failing test), PP will ensure only that test fails.
If a snapshot test needs updating due to an intentional UI change, PP can use a custom /puppeteer-update-snapshots skill to do so.
If PP isn't sure what to do, it is instructed to escalate to the user.
There is a max loop iteration count of 5. If PP cannot make tests pass in 5 rounds, it is instructed to summarize its status and escalate to the user.

Results

Tests were performed on a fork of em to avoid polluting cybersemics/em with test PRs and issues. Note that the failing sidebar snapshot tests are intentional and were due to some initial experimentation while developing this agent.

On Add failing test: gestures broken after dragging duplicate thought to Home #4045:
- Before, Copilot's default agent worked for 17 minutes and submitted a passing test when explicitly told to submit a failing one.
- With PP, the agent ran for 42 minutes and one-shotted a failing test.
  - At first, it also wrote a test that passed. But before marking the work as done, PP saw that we explicitly asked it to write a failing one, and then spent the majority of its time investigating why the test passed.
- See agent's PR on my fork: Add failing test: "moved to home context" alert missing after dragging duplicate subthought to root fbmcipher/em#11
On Test: Caret moves to incorrect thought from note #3973:
- Before, Copilot's default agent stated that tests passed on a commit, but they were not passing – even after being explicitly asked to check again.
- After, with PP, the agent's work passes tests as instructed.
- See agent's PR on my fork: Test: Caret moves to incorrect Thought from Note fbmcipher/em#13

Key takeaways

Copilot follows agent behaviours more consistently than custom instructions.
Agent personas significantly affect performance.
MCP tools give the agent access to additional information like CI status, which was the key solution to this problem.
Custom agents can turn a 17-minute wrong answer into a 42-minute correct one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent Pusher agent #4047

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Persistent Pusher agent #4047

Uh oh!

Uh oh!

fbmcipher Apr 7, 2026 Maintainer

Background

Details

Results

Key takeaways

Replies: 0 comments

fbmcipher
Apr 7, 2026
Maintainer