Skip to content

Conversation

matheus23
Copy link
Member

@matheus23 matheus23 commented Jul 23, 2025

Description

endpoint_relay_connect_loop was flaky before. This should fix this issue. It was last marked flaky (but for windows only) in #3354.

What seems to have happened:

  • The test spawns a relay server
  • The test spawns a server iroh Endpoint
  • The server Endpoint does a net report with QAD probes
  • For some reason the spawned relay server is very slow in responding to QAD requests (>3s)
  • The QAD probes time out, the server Endpoint ends up without a home relay
  • The test spawns a client iroh Endpoint
  • The client Endpoint tries to connect for 30s
  • The server Endpoint doesn't do another net report for 30s, though, so never ends up being reachable
  • The client Endpoint times out.

To work around this, I'm starting the server endpoint and waiting for it to have a relay address.
IMO this is reasonable to do in tests.

I've also made the tests use Connection::close and Connection::closed properly and removed SendStream::stopped and RecvStream::read_to_end(0) calls.

Notes

There's some other drive-by changes. Sorry about that, but IMO they're kinda too small for their own PRs:

  • net_report thought that Watchable::set would return Err when there's no more watchers listening, but that's incorrect: It returns Err from set when the value set is the same as the currently stored value.
  • I've also made some small cosmetic changes to net_report
  • I've removed a 1600 bytes allocation from the hot path of receiving relay items in ActiveRelayActor.

Change checklist

  • Self-review.

@matheus23 matheus23 self-assigned this Jul 23, 2025
Copy link

github-actions bot commented Jul 23, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3402/docs/iroh/

Last updated: 2025-07-25T10:15:13Z

Copy link

github-actions bot commented Jul 23, 2025

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: 3c0d6e7

@n0bot n0bot bot added this to iroh Jul 23, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Jul 23, 2025
Just things I found while going through code relevant to the timing of things.
@matheus23 matheus23 enabled auto-merge July 25, 2025 10:13
@matheus23 matheus23 added this pull request to the merge queue Jul 25, 2025
Merged via the queue into main with commit 8426241 Jul 25, 2025
29 checks passed
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in iroh Jul 25, 2025
@matheus23 matheus23 deleted the matheus23/less-flakey-ep-relay-connect-loop-test branch July 25, 2025 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants