-
Notifications
You must be signed in to change notification settings - Fork 22
Check client versions when performing any operations #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Probably maybe closes ppy#52. This is the laziest possible version of a change that could possibly maybe come close to parameters of "checking client versions" outlined in the above issue. Why is it lazy? Well there's a bunch of problems here and I don't know how to solve any of them, so this is supposed to be a start of a conversation. The list of problems is presented below in a numerical list. 1. This change prevents execution of any method of any hub if the client version does not match. In particular, this includes the spectator hub. Which is now responsible for recording replays. Therefore, if this is deployed as is, old clients will potentially be able to submit scores that don't have replays. This part is possibly irrelevant, if it is ensured that all builds have consistent values of `allow_ranking` and `allow_bancho`, as in both should be `true` or `false` consistently and not mixed. 2. Retrieving the client hash is dependent on connecting to the metadata hub. If the client can somehow connect to all hubs except metadata, they will have online functionality blocked. This can maybe be resolved by having a redundant copy of the hash information *inside* the filter implemented here. 3. This can only really throw on attempting to execute any hub operation. Throwing on connect is not possible because the filter executes independently for each hub that we maintain. Therefore: - In spectator and multiplayer hub, because of point (2) (relying on the metadata hub to populate user state), it is not guaranteed that we can *read* the user state to get the user's client hash. - In metadata hub, the user client hash *can* be read reliably if it is checked *after* the hub's `OnConnectedAsync()` is ran first, *but* throwing inside `OnConnectedAsync()` causes the client to disconnect from the metadata hub due to the error, and then proceed to get stuck in a loop of trying to re-connect every 3 seconds, which seems... let's call it 'suboptimal'? 4. Because of how simple this is (throw on every operation) this could get pretty spammy client-side. In testing, client handles this spam *okay* by limiting the count of notifications emitted... as long as it actually handles the errors. More on this later, await a client-side PR. 5. The user is not forcibly disconnected from API, and instead is in a weird half-alive state where they can use API-dependent functions but not the realtime stuff. Adding a forcible logout would require client changes, but clients that are *right now* considered old won't abide by those changes, for obvious reasons (we can't ship extra code to deployed builds). I think that's all of the caveats but I might be forgetting some at this point. Test coverage can be added, but (a) I'm not sure how much of this is going to end up in the trash, and (b) the code is so dead simple that you may as well go and test full stack (and it's the arguably only *useful* sort of testing here), so I'm not bothering until I'm sure it's worth the admission price.
…ors observed Fell out when attempting ppy/osu-server-spectator#346. Functionally, if a true non-`HubException` is produced via an invocation of a spectator server hub method, this doesn't really do much - the error will still log as 'unobserved' due to the default handler, it will still show up on sentry, etc. The only difference is that it'll get handled via the continuation installed in `FireAndForget()` rather than the `TaskScheduler.UnobservedTaskException` event. The only real case where this is relevant is when the server throws `HubException`s, which will now instead bubble up to a more human-readable form. Which is relevant to the aforementioned PR because that one makes any hub method potentially throw a `HubException` if the client version is too old. Obviously this does nothing for the existing old clients.
…ors observed (#35488) Fell out when attempting ppy/osu-server-spectator#346. Functionally, if a true non-`HubException` is produced via an invocation of a spectator server hub method, this doesn't really do much - the error will still log as 'unobserved' due to the default handler, it will still show up on sentry, etc. The only difference is that it'll get handled via the continuation installed in `FireAndForget()` rather than the `TaskScheduler.UnobservedTaskException` event. The only real case where this is relevant is when the server throws `HubException`s, which will now instead bubble up to a more human-readable form. Which is relevant to the aforementioned PR because that one makes any hub method potentially throw a `HubException` if the client version is too old. Obviously this does nothing for the existing old clients.
| var build = await memoryCache.GetOrCreateAsync(hash, async _ => | ||
| { | ||
| using (var db = databaseFactory.GetInstance()) | ||
| return await db.GetBuildByHashAsync(hash); | ||
| }); | ||
|
|
||
| return build?.allow_bancho == true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the lifetime on these objects? Do we care about toggling allow_bancho without a new spectator startup?
If so then you probably want an absolute expiry window here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably do yeah. I'd say 10-30 minute refresh is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set to 30 minutes in 8c96ae4
| string? hash; | ||
| using (var item = await metadataStore.GetForUse(callerContext.GetUserId())) | ||
| hash = item.Item?.VersionHash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think point (2) in OP is a big one, and I think it's easily possible to occur. Agree with duplicating it in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in dcfd65f
…hecker Instead of relying on metadata hub. Leads to some duplicated storage but safer, probably.
|
Revisiting this, one caveat is that us devs will no longer be able to connect to the live environment from locally built releases. Bancho gets around this by adding admin overrides for client hash checks. @bdach thoughts on whether we want to do that here? or just be like, "we shouldn't be doing that in the first place and should be using staging instead"? |
|
I'd be fine with adding some allowlist type facility for our own use in times of need if you are. |
|
Let's go in that direction then. Either an envvar list of groups to include, or just one should be enough ( |
RFC. Probably maybe closes #52.
This is the laziest possible version of a change that could possibly maybe come close to parameters of "checking client versions" outlined in the above issue.
Why is it lazy? Well there's a bunch of problems here and I don't know how to solve any of them, so this is supposed to be a start of a conversation. The list of problems is presented below in a numerical list.
This change prevents execution of any method of any hub if the client version does not match. In particular, this includes the spectator hub. Which is now responsible for recording replays. Therefore, if this is deployed as is, old clients will potentially be able to submit scores that don't have replays.
This part is possibly irrelevant, if it is ensured that all builds have consistent values of
allow_rankingandallow_bancho, as in both should betrueorfalseconsistently and not mixed.Retrieving the client hash is dependent on connecting to the metadata hub. If the client can somehow connect to all hubs except metadata, they will have online functionality blocked.This can maybe be resolved by having a redundant copy of the hash information inside the filter implemented here.Implemented in dcfd65f.This can only really throw on attempting to execute any hub operation. Throwing on connect is not possible for reasons that vary, because the filter executes independently for each hub that we maintain. Therefore:
In spectator and multiplayer hub, because of point (2) (relying on the metadata hub to populate user state), it is not guaranteed that we can read the user state to get the user's client hash.In metadata hub, the user client hash can be read reliably if it is checked after the hub'sthrowing insideOnConnectedAsync()is ran first, butOnConnectedAsync()causes the client to disconnect from the metadata hub due to the error, and then proceed to get stuck in a loop of trying to re-connect every 3 seconds, which seems... let's call it 'suboptimal'?Because of how simple this is (throw on every operation) this could get pretty spammy client-side. In testing, client handles this spam okay by limiting the count of notifications emitted... as long as it actually handles the errors. More on this later, await a client-side PR (Ensure all invocations of spectator server hub methods have their errors observed osu#35488).
The user is not forcibly disconnected from API, and instead is in a weird half-alive state where they can use API-dependent functions but not the realtime stuff. Adding a forcible logout would require client changes, but clients that are right now considered old won't abide by those changes, for obvious reasons (we can't ship extra code to deployed builds).
I think that's all of the caveats but I might be forgetting some at this point.
Test coverage can be added, but (a) I'm not sure how much of this is going to end up in the trash, and (b) the code is so dead simple that you may as well go and test full stack (and it's the arguably only useful sort of testing here), so I'm not bothering until I'm sure it's worth the admission price.