Skip to content

Conversation

@bdach
Copy link
Collaborator

@bdach bdach commented Oct 28, 2025

RFC. Probably maybe closes #52.

This is the laziest possible version of a change that could possibly maybe come close to parameters of "checking client versions" outlined in the above issue.

Why is it lazy? Well there's a bunch of problems here and I don't know how to solve any of them, so this is supposed to be a start of a conversation. The list of problems is presented below in a numerical list.

  1. This change prevents execution of any method of any hub if the client version does not match. In particular, this includes the spectator hub. Which is now responsible for recording replays. Therefore, if this is deployed as is, old clients will potentially be able to submit scores that don't have replays.

    This part is possibly irrelevant, if it is ensured that all builds have consistent values of allow_ranking and allow_bancho, as in both should be true or false consistently and not mixed.

  2. Retrieving the client hash is dependent on connecting to the metadata hub. If the client can somehow connect to all hubs except metadata, they will have online functionality blocked.

    This can maybe be resolved by having a redundant copy of the hash information inside the filter implemented here. Implemented in dcfd65f.

  3. This can only really throw on attempting to execute any hub operation. Throwing on connect is not possible for reasons that vary, because the filter executes independently for each hub that we maintain. Therefore:

    • In spectator and multiplayer hub, because of point (2) (relying on the metadata hub to populate user state), it is not guaranteed that we can read the user state to get the user's client hash.

    • In metadata hub, the user client hash can be read reliably if it is checked after the hub's OnConnectedAsync() is ran first, but throwing inside OnConnectedAsync() causes the client to disconnect from the metadata hub due to the error, and then proceed to get stuck in a loop of trying to re-connect every 3 seconds, which seems... let's call it 'suboptimal'?

  4. Because of how simple this is (throw on every operation) this could get pretty spammy client-side. In testing, client handles this spam okay by limiting the count of notifications emitted... as long as it actually handles the errors. More on this later, await a client-side PR (Ensure all invocations of spectator server hub methods have their errors observed osu#35488).

  5. The user is not forcibly disconnected from API, and instead is in a weird half-alive state where they can use API-dependent functions but not the realtime stuff. Adding a forcible logout would require client changes, but clients that are right now considered old won't abide by those changes, for obvious reasons (we can't ship extra code to deployed builds).

I think that's all of the caveats but I might be forgetting some at this point.

Test coverage can be added, but (a) I'm not sure how much of this is going to end up in the trash, and (b) the code is so dead simple that you may as well go and test full stack (and it's the arguably only useful sort of testing here), so I'm not bothering until I'm sure it's worth the admission price.

Probably maybe closes
ppy#52.

This is the laziest possible version of a change that could possibly
maybe come close to parameters of "checking client versions" outlined in
the above issue.

Why is it lazy? Well there's a bunch of problems here and I don't know
how to solve any of them, so this is supposed to be a start of a
conversation. The list of problems is presented below in a numerical
list.

1. This change prevents execution of any method of any hub if the client
   version does not match. In particular, this includes the spectator
   hub. Which is now responsible for recording replays. Therefore, if
   this is deployed as is, old clients will potentially be able to
   submit scores that don't have replays.

   This part is possibly irrelevant, if it is ensured that all builds
   have consistent values of `allow_ranking` and `allow_bancho`, as in
   both should be `true` or `false` consistently and not mixed.

2. Retrieving the client hash is dependent on connecting to the metadata
   hub. If the client can somehow connect to all hubs except metadata,
   they will have online functionality blocked.

   This can maybe be resolved by having a redundant copy of the hash
   information *inside* the filter implemented here.

3. This can only really throw on attempting to execute any hub
   operation. Throwing on connect is not possible because the filter
   executes independently for each hub that we maintain. Therefore:

   - In spectator and multiplayer hub, because of point (2) (relying on
     the metadata hub to populate user state), it is not guaranteed that
     we can *read* the user state to get the user's client hash.

   - In metadata hub, the user client hash *can* be read reliably if it
     is checked *after* the hub's `OnConnectedAsync()` is ran first,
     *but* throwing inside `OnConnectedAsync()` causes the client to
     disconnect from the metadata hub due to the error, and then proceed
     to get stuck in a loop of trying to re-connect every 3 seconds,
     which seems... let's call it 'suboptimal'?

4. Because of how simple this is (throw on every operation) this could
   get pretty spammy client-side. In testing, client handles this spam
   *okay* by limiting the count of notifications emitted... as long as
   it actually handles the errors. More on this later, await a
   client-side PR.

5. The user is not forcibly disconnected from API, and instead is in a
   weird half-alive state where they can use API-dependent functions but
   not the realtime stuff. Adding a forcible logout would require
   client changes, but clients that are *right now* considered old won't
   abide by those changes, for obvious reasons (we can't ship extra code
   to deployed builds).

I think that's all of the caveats but I might be forgetting some at this
point.

Test coverage can be added, but (a) I'm not sure how much of this is
going to end up in the trash, and (b) the code is so dead simple that
you may as well go and test full stack (and it's the arguably only
*useful* sort of testing here), so I'm not bothering until I'm sure it's
worth the admission price.
@bdach bdach requested a review from peppy October 28, 2025 12:44
@bdach bdach self-assigned this Oct 28, 2025
@bdach bdach moved this from Next up to Pending Review in osu! untitled project Oct 28, 2025
bdach added a commit to bdach/osu that referenced this pull request Oct 28, 2025
…ors observed

Fell out when attempting
ppy/osu-server-spectator#346.

Functionally, if a true non-`HubException` is produced via an invocation
of a spectator server hub method, this doesn't really do much - the
error will still log as 'unobserved' due to the default handler, it will
still show up on sentry, etc. The only difference is that it'll get
handled via the continuation installed in `FireAndForget()` rather than
the `TaskScheduler.UnobservedTaskException` event.

The only real case where this is relevant is when the server throws
`HubException`s, which will now instead bubble up to a more
human-readable form. Which is relevant to the aforementioned PR because
that one makes any hub method potentially throw a `HubException` if the
client version is too old.

Obviously this does nothing for the existing old clients.
smoogipoo pushed a commit to ppy/osu that referenced this pull request Oct 29, 2025
…ors observed (#35488)

Fell out when attempting
ppy/osu-server-spectator#346.

Functionally, if a true non-`HubException` is produced via an invocation
of a spectator server hub method, this doesn't really do much - the
error will still log as 'unobserved' due to the default handler, it will
still show up on sentry, etc. The only difference is that it'll get
handled via the continuation installed in `FireAndForget()` rather than
the `TaskScheduler.UnobservedTaskException` event.

The only real case where this is relevant is when the server throws
`HubException`s, which will now instead bubble up to a more
human-readable form. Which is relevant to the aforementioned PR because
that one makes any hub method potentially throw a `HubException` if the
client version is too old.

Obviously this does nothing for the existing old clients.
Comment on lines 52 to 58
var build = await memoryCache.GetOrCreateAsync(hash, async _ =>
{
using (var db = databaseFactory.GetInstance())
return await db.GetBuildByHashAsync(hash);
});

return build?.allow_bancho == true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the lifetime on these objects? Do we care about toggling allow_bancho without a new spectator startup?

If so then you probably want an absolute expiry window here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably do yeah. I'd say 10-30 minute refresh is fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set to 30 minutes in 8c96ae4

Comment on lines 45 to 47
string? hash;
using (var item = await metadataStore.GetForUse(callerContext.GetUserId()))
hash = item.Item?.VersionHash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think point (2) in OP is a big one, and I think it's easily possible to occur. Agree with duplicating it in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in dcfd65f

@bdach bdach marked this pull request as ready for review October 29, 2025 08:32
@peppy
Copy link
Member

peppy commented Dec 11, 2025

Revisiting this, one caveat is that us devs will no longer be able to connect to the live environment from locally built releases. Bancho gets around this by adding admin overrides for client hash checks.

@bdach thoughts on whether we want to do that here? or just be like, "we shouldn't be doing that in the first place and should be using staging instead"?

@bdach
Copy link
Collaborator Author

bdach commented Dec 11, 2025

I'd be fine with adding some allowlist type facility for our own use in times of need if you are.

@peppy
Copy link
Member

peppy commented Dec 11, 2025

Let's go in that direction then. Either an envvar list of groups to include, or just one should be enough (11 for developers on production).

@peppy peppy merged commit 82900d2 into ppy:master Dec 12, 2025
2 checks passed
@github-project-automation github-project-automation bot moved this from Pending Review to Done in osu! untitled project Dec 12, 2025
@bdach bdach deleted the client-version-check-2 branch December 12, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Client versions need to be checked on connect

3 participants