Deep review agent #4040
Replies: 1 comment 2 replies
-
Initial experiencesUsers can assign Copilot to a PR to have it review the submission. It'll post review comments on specific lines of code, just like a human does. In general, I felt the comments it left were all useful. In the first PR I tried this with, #4003, I either felt its comments were important and actionable, or felt that the comments opened useful discussion and helped me feel more confident about some of the implicit decisions and assumptions within the code. You can directly ask Copilot to make changes to the code by @'ing it in a comment thread, which is a useful timesaver. However, this only works for branches in cybersemics/em. In other words, because my pull request was for a branch in my fork (fbmcipher/em), I couldn't ask Copilot to make changes.
This is a major shame, as in my experience a lot of time could be saved if comments like these could be resolved inline without having to check out the latest code or open my IDE. All contributors are currently pushing branches to their own forks and then opening PRs. For this reason, to continue my investigation I had to push a branch to It's interesting to note that though the contents of #4003 and #4039 are identical, the comments it left were completely different with very little crossover. Both sets of comments of useful. I guess this makes sense given the non-deterministic nature of AI, but it still surprised me and felt worth mentioning. Key points
|
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Our investigatory code style review agent is trained on 4000+ PR comments from over the years. It's designed to automate away repetitive code style, architectural and comments that enforce good coding standards – potentially saving @raineorshine time from "first-pass" busywork writing comments that have already been written before.
In contrast, a deep review agent would be much less guided. Rather than being explicitly trained to pick up on repeated comments and patterns, it would act independently as an "AI senior developer", making nuanced suggestions that humans may not necessarily even pick up on. In this way, we rely more on the "intuition" of the model.
We've had discussions about consistency – that an agent correct 70% of the time is wrong 30% of the time – and how that can have a net-negative impact on productivity, create noise and slow down the team.
To avoid that issue, we will need to test and scrutinize these review agents to understand how they work, and ensure that where they do make mistakes, that it can be guided away from them for future reviews using custom instructions.
From now on, I will assign my PRs to Copilot for a first review. I think there's a great opportunity in this – when a PR hits Raine's table for review, we could feasibly already have a pass or two of PRs done – just pending a final review by Raine.
In this thread, we can share experiences using AI code review agents.
Beta Was this translation helpful? Give feedback.
All reactions