Enable stateful reconnects #35658

smoogipoo · 2025-11-08T11:34:30Z

RFC

Requires Enable stateful reconnects osu-server-spectator#366 to actually function.

Resolves #35580
Resolves ppy/osu-server-spectator#193
Resolves #35586
Resolves ppy/osu-server-spectator#362

Outline

This enables stateful reconnect for spectator-server endpoints, allowing ConnectionIds to be preserved for a short period and messages to be replayed on reconnect.

In practice, this means short disconnects (<30s) should no longer:

Drop replays
Kick you out of multiplayer rooms
Trigger "user has come online" re-alerts.

The following video demonstrates two of the above:

2025-11-08.19-30-17.mp4

Stateful reconnect appears to kick in as long as the socket doesn't get disconnected, not on subsequent re-connections. We have the timeout period set to SignalR's default of 30sec.

I've been using the following to simulate a total link loss:

#!/bin/bash

DELAY=${1:-1}

echo "Conditioning for $DELAY seconds..."

sudo ip link set lo down
sudo ss -K dst 127.0.0.2 > /dev/null
sleep $DELAY
sudo ip link set lo up

smoogipoo · 2025-11-12T13:24:03Z

On Discord I was asked about why the connection dies after 30sec.

Initially I thought this was just the socket keepalive period, but it appears that is set to 15 seconds. Regardless, I found that there's a second timeout which is that 30s window that we're concerned about, which can be adjusted with the following:

diff --git a/osu.Server.Spectator/Startup.cs b/osu.Server.Spectator/Startup.cs
index 3e326cc..028afd2 100644
--- a/osu.Server.Spectator/Startup.cs
+++ b/osu.Server.Spectator/Startup.cs
@@ -29,6 +29,7 @@ namespace osu.Server.Spectator
                     {
                         options.AddFilter<LoggingHubFilter>();
                         options.AddFilter<ConcurrentConnectionLimiter>();
+                        options.ClientTimeoutInterval = TimeSpan.FromMinutes(5);
                     })
                     .AddMessagePackProtocol(options =>
                     {

diff --git a/osu.Game/Online/HubClientConnector.cs b/osu.Game/Online/HubClientConnector.cs
index ff9a4261fd..c87ba0812c 100644
--- a/osu.Game/Online/HubClientConnector.cs
+++ b/osu.Game/Online/HubClientConnector.cs
@@ -72,6 +72,7 @@ protected override Task<PersistentEndpointClient> BuildConnectionAsync(Cancellat
                     options.Headers.Add(CLIENT_SESSION_ID_HEADER, API.SessionIdentifier.ToString());
                 });
 
+            builder.WithServerTimeout(TimeSpan.FromMinutes(5));
             builder.WithStatefulReconnect();
 
             builder.AddMessagePackProtocol(options =>

I haven't re-tested but I've been able to go up to 1-minute before. I don't know the implications.

bdach · 2025-11-12T14:01:52Z

Empirical observations from local testing of what this "stateful reconnects" feature is, which I gathered myself because the docs are terrible.

Conditions of test:

Separate signalr project with 2 disparate hubs, with 2 operations in each hub
Client and server run on different PCs on local network
Link trouble & link loss simulated on client PC via Network Link Conditioner

Conclusions:

Async invocations of hub methods appear to block until acknowledgement is received from the server. Once reconnection occurs and the invocations succeed server-side, the hub methods unblock and program execution is continued.
- Tested via full loss of link.
Order of invocations appears to be guaranteed at hub level. Invocations from a single client instance that span hub boundaries are not guaranteed to occur in the same order.
- Tested via induced 50% packet loss. Order of messages was not preserved between client and server in general, but messages within a single hub were kept in the same order.
The reconnection works and preserves connection IDs as long as the original websocket doesn't go dead due to one of the relevant keepalive periods.
- Without stateful reconnection, once an The remote party closed the WebSocket connection without completing the close handshake occurs (about 15s in), all subsequent SignalR operations fail instantly and can be essentially considered instantly dropped.
- Without stateful reconnection but with automatic reconnection, after some time has passed after the above, SignalR operations go from instant-fail to full-blocking, block for a couple of seconds, and then go back to instant-fail. I'm not sure what this is caused by, but my hypothesis is that it's linked to the retry policy determining that it's time to retry connecting again.
- With both stateful and automatic, some messages are still dropped. Client operations fully block during the connection failure, but then some fail after the connection failure is resolved due to the 30s server timeout.
- Regardless of stateful reconnection, if some keepalive period expires (either socket keepalive or client timeout), then connectivity resumes, the client re-establishes the connection, but with a new set of connection IDs.
Both client and server use message buffers for this stateful reconnection. The size of this buffer is configurable on both sides, but seemingly not easily instrumentable to check how bad the utilisation of it is at any given time.

I haven't been able to emprically exercise the effects of overrunning this buffer very well except for noticing that when I set it to an obscenely low amount like a byte the client stops doing anything, and that would check out with the implementation of this buffer that I found in ASP.NET source which says that "primitive backpressure" (i.e. fully blocking the relevant message until enough space is reclaimed or the connection is dropped) is utilised.

Long and short, I'm not sure spending more time on investigation here is useful at this stage, I'll reconsider that tomorrow. My immediate vibes on this are as follows:

This is probably better than the nothing that we have, but probably won't be as good as we'd hope it to be either
There are knobs we can tweak here, to some effect, that may improve how this works
There's also a giant danger sign on how primitive the buffer blocking logic looks, which may cause very bad blockages especially server-side (if there's anything else I'd want to investigate further it's this).

smoogipoo added the area:online functionality Deals with online fetching / sending but don't change much on a surface UI level. label Nov 8, 2025

pull-request-size bot added the size/L label Nov 8, 2025

smoogipoo closed this Nov 8, 2025

smoogipoo force-pushed the stateful-reconnect branch from b90a734 to 680614f Compare November 8, 2025 11:34

pull-request-size bot added size/XS and removed size/L labels Nov 8, 2025

smoogipoo reopened this Nov 8, 2025

pull-request-size bot added size/M and removed size/XS labels Nov 8, 2025

Enable stateful reconnects

67277f2

smoogipoo force-pushed the stateful-reconnect branch from 3f90ca4 to 67277f2 Compare November 8, 2025 11:36

pull-request-size bot added size/XS and removed size/M labels Nov 8, 2025

smoogipoo self-assigned this Nov 8, 2025

smoogipoo requested a review from a team November 8, 2025 11:40

ppy deleted a comment from smoogipoo Nov 17, 2025

smoogipoo marked this pull request as draft November 17, 2025 09:47

smoogipoo mentioned this pull request Dec 6, 2025

Matchmaking ejects me from the map #35902

Closed

smoogipoo mentioned this pull request Dec 29, 2025

Notification can be very noisy when a friend has poor network connection to server #35586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable stateful reconnects #35658

Enable stateful reconnects #35658

smoogipoo commented Nov 8, 2025 •

edited

Loading

Uh oh!

smoogipoo commented Nov 12, 2025 •

edited

Loading

Uh oh!

bdach commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Enable stateful reconnects #35658

Are you sure you want to change the base?

Enable stateful reconnects #35658

Conversation

smoogipoo commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Outline

Uh oh!

smoogipoo commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdach commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smoogipoo commented Nov 8, 2025 •

edited

Loading

smoogipoo commented Nov 12, 2025 •

edited

Loading