Synchronise integration tests on IPC run-finished signal instead of sleeps#5988
Synchronise integration tests on IPC run-finished signal instead of sleeps#5988mostafaNazari702 wants to merge 9 commits into
Conversation
4b58f06 to
bdb63e0
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5988 +/- ##
==========================================
+ Coverage 80.89% 81.02% +0.12%
==========================================
Files 64 64
Lines 4602 4607 +5
Branches 976 997 +21
==========================================
+ Hits 3723 3733 +10
+ Misses 879 874 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
JoshuaKGoldberg
left a comment
There was a problem hiding this comment.
This PR is meant to work on the watch tests, but looking at the two failures in CI, both are:
1) --watch
when enabled
reruns test when file and directory paths under --watch-files are added:
Error: runMochaWatchAsync: timed out after 6000ms waiting for watch run to finish
at Timeout._onTimeout (test/integration/helpers.js:490:13)
at listOnTimeout (node:internal/timers:588:17)
at process.processTimers (node:internal/timers:523:7)
The strategies of waiting for explicit signals (rather than hardcoded timing) sounds good to me. But it looks like this PR doesn't fully fix the issues.
bdb63e0 to
76376b6
Compare
76376b6 to
4ba8e3c
Compare
|
The in-PR tests failed even after the commit "wait for chokidar to start watching before first run" which was supposed to fix the test timing issues, and then when i pushed tracing instruments, suddenly the tests work, twice....This is very mind-boggling. We are dealing with a bug that does not want to be caught. |
|
We ( as in me ) are now re-running tests to confirm whether i hopefully fixed it to remove the tracing insturments. Run 1 ( after committing ): 79 successful checks, all have passed and are green. (Josh commented during this specific stage). Updates: Re-run 1: 80 checks, all green and passed. Re-run 2: 80 checks, all green and passed. Re-run 3: The issue has appeared again: [Tests / lint / lint]: Failing after 24s
[Tests / Test integration in all environments / test-node:integration with node.js 20.19.4 on ubuntu-latest]: Failing after 3mLast edit to this message: Evaluating whether i should give up or not, 6 hours of troubleshooting totally useless and returned partial positive results. |
|
Swell. Whenever you think it's ready, feel free to re-request my review & mark this as ready / not draft. Exciting! |
I recommend to read this comment first before continuing to read. After instrumenting both sides and comparing passing vs failing CI runs, the issue turned out not to be in our code or inotify watches are per-inode and non-recursive so when a new subdirectory is created This matches prior chokidar work ( that made me give up after finding and going through them):
i also checked alternatives (@parcel/watcher, nsfw, Watchman) and they all have equivalent limitations or unresolved races, so this doesn't appear solvable at the userspace watcher level. So im stopping further fix attempts here as i have truely put a lot of effort that i should not have ( i don't mean that it does not deserve my time, but rather that i should have been smarter and actually tried to google my issue and find the impossibleness of this issue ). the IPC handshake changes are still a real improvement and stabilize the rest of the watch suite, this one test just hits the kernel race window. |
On main, these tests currently fail only on Windows (#5361), have we at least solved that? If we've solved that and introduced an issue unique to Linux (where these tests have passed consistently for the past 6 months), then I may have some ideas:
Can we, um, move outside of this window? That is, add a 500-millisecond sleep (or whatever works) back into the test to workaround this? I know the PR title is currently "Synchronise integration tests on IPC run-finished signal instead of sleeps" but if we add "when possible" to the end I'm still a happy camper with passing tests. Of course, would this open us up to no longer catching failing scenarios? If so we should detail those and see what we can do. (I'm almost back to 100% capacity from my disability, haven't looked at this code yet, but I definitely want to fix this bug!) |
Windows is fixed, on this PR, the watch integration suite is green on windows-latest across node 20/22/24 in every CI re-run (and locally), where main is the flaky one (#5361). the watch suite is also faster locally (44 seconds in my branch vs 2 mins in main), which was the original goal of #5714 I have decided to drop my Linux-kernel-related issues and focus solely on #5714 and its goal. That issue will need a new and separate issue that addresses it. Only remaining failure is intermittent on ubuntu Node 20, the "…file and directory paths under --watch-files are added"-test. |
0f3d2a2 to
d5cdbe9
Compare
|
Fails again. I can not do anything more in here unfortuantely. |
|
Yes, please do feel free to step away if you're ever frustrated with a PR :) |
I wish you the best of luck with this PR, my friend. The best that can be done is basically extending the timeout time. For now, signal-based is not 100% feasible but you are the expert! |

PR Checklist
status: accepting prsOverview
runMochaWatchAsync used fixed 2 seconds delays between each "change" which was slow in CI.
Replace with an IPC
mocha:watch:runFinishedevent to let tests to wait for completion instead of sleeping. Helper now always forks and exposes waitForRunFinished.