How to motivate people to fix flaky tests #21
Description
(continued from nodejs/build#248)
The current documented policy for flaky tests (https://github.com/nodejs/node/wiki/Flaky-tests#what-to-do-when-you-encounter-a-new-flaky-test) calls for opening an issue to track them when you mark the test as flaky, and assigning the issue to the next release milestone.
One part that I think could use some improvement is clarifying who is going to take responsibility for fixing the flaky test / how to motivate people to do it. The person who marks the tests as flaky is usually the collaborator who is making the determination that the test is not failing because of the current pull request changes. They are not motivated to fix the test and not necessarily the most qualified in the particular test that is failing.
In a dev team working for one company, you could probably just assign the issue to the test author/owner. I am not sure that this would work in an open source project.
So how do we motivate collaborators to investigate and fix these failures? Here are some options we could consider:
- We stick to the current policy of assigning them to the next release milestone. The main problem with this is that we might see the list too late in the release cycle, and decide to punt. To counter that, we could try to increase awareness of issues that are flagged for a milestone, throughout the milestone (for example by sending periodic reports). I think this might be useful even beyond flaky tests.
- We somehow set a timeout for how long a test can be marked as flaky, or a limit to the number of tests marked as flaky, or both. When the limit is reached, we block merging all pull requests until the situation is brought back under control. This makes the whole team accountable for fixing flaky tests. I recall @domenic saying that this is what the v8 team does.
- We try to motivate collaborators by some sort of reward mechanism, like keeping a scoreboard with how many issues they have resolved in a given period (and assign extra points for flaky tests).
- We come up with an algorithm for assigning the flaky test issue to a specific collaborator, in addition to assigning a milestone. For example, the assignee could be the last active collaborator to have modified the test logic. If the search doesn't yield a suitable candidate, we could fall back to round-robin selection between a pool of collaborators, ideally choosing the pool based on the the area of the test.