-
Notifications
You must be signed in to change notification settings - Fork 1
Use file storage as cache backing store #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for leaving this unreviewed for as long as it was. That said I'm not sure you're going to be happy with my review either...
I have concerns with leaning on redis again for the expiry tracking. Especially given this:
The eviction works by setting a dummy key on Redis with an expiry, and then we capture the expiry using Redis' keyspace events in the background. In order for these events to be captured, you will need to run the following command on the indicated Redis instance to capture expire(d) and eviction events:
config set notify-keyspace-events K$xe
as I worry about the performance implications of doing this if this is ran on a redis instance that is not completely dedicated to this service.
@peppy probably have a read through this and see if you agree with my points, since most of them are subjective architectural "feels" rather than concrete criticisms.
Required to ensure cache folders are unique per test-run
|
As requested, this no longer uses Redis to keep track of the cache. As a result there are no mentions of Redis at all in this project now, although that will change once I PR a solution for #2. This works by creating subdirectories for each date (at time of upload), with a worker that will constantly check each subdirectory and delete it if it passes the maximum amount of cached days (as per app settings). |
The bug fix mentioned in ppy#12 (comment) was attempted to be covered with tests added in 18309f2, but those test changes *don't actually cover the failure*, because they exercised the *cached* replay, which wasn't affected by the lack of seek, instead of the *stored* replay, which *was* affected by the lack of seek.
bdach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Looks good. |
|
Are we good to merge this? I'm gonna need this in to work on #2. |
|
Checking on this today. |
| /// In the case of the legacy cache folder, replays must be split by ruleset, because stable scores have separate ID schemes per ruleset, | ||
| /// so there is another hierarchy level inside with a folder per ruleset. | ||
| /// At the lowest level, the cache is divided up by folders of each date. | ||
| /// When a replay is added to the cache, it will be put into a folder named by the date it was added in <c>ddMMyy</c> format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a valid reason we're splitting out legacy vs non-legacy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly because thats how it works currently. Does web-10 have the guarantee that it will know the non-legacy ID at point of uploading the replay? What about API v1 or legacy scores via API v2 that request the replay using the legacy score ID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also currently the upload flow relies on separating legacy and non-legacy uploads since non-legacy uploads have replay headers attached and non-legacy ones don't. I'm happy to move things to a more unified model if we have the guarantees questioned in my other comment, but it will be an undertaking in itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the initial upload (if we're actually planning to do that from web-10 and not make it in new code) may have issues. the other uses you mention won't (we maintain a mapping).
also don't expect legacy score tables to be a thing for too much longer. we have short-term plans of nuking it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay cool. I'll think of a way to unify the replay process for both clients then. It's likely just going to be checking for legacy_score_id on the scores table unless you have a preferred method of doing this (I still need to know if a score is set in stable so I can handle replay headers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done this the best I can in 2ed5846. Please read the commit description for a better understanding of what legacy parts remain.
As far as I'm aware, keeping the storage split between legacy and solo is a non-negotiable given all legacy replays are currently stored in the legacy replay buckets and unifying this so that even legacy replays upload to the new solo bucket would require logic to check all buckets when trying to fetch a replay for a legacy score (since there's the chance it could be in any of the buckets). If we feel strongly about that, I can implement the logic to check each bucket and remove that one remaining legacy distinction but I'm not immediately convinced its worth it.
This still feels a lot better though, since there is now one set of endpoints for all replays
This attempts to unify legacy and solo replays as much as possible. All endpoints now take solo score IDs only. Long-term replay storage (usually S3) still splits between legacy and solo to support existing systems, but all caching is unified. All references to legacy score tables are nuked.
| legacy_total_score = 13160096, | ||
| ScoreData = new SoloScoreData | ||
| { | ||
| Statistics = new Dictionary<HitResult, int>() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning test
|
|
||
| var date = DateTimeOffset.UtcNow.Date; | ||
|
|
||
| var scoreData = new SoloScoreData() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning test
|
|
||
| var scoreData = new SoloScoreData() | ||
| { | ||
| Statistics = new Dictionary<HitResult, int>() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning test
| [HitResult.Meh] = 0, | ||
| [HitResult.Miss] = 0, | ||
| }, | ||
| MaximumStatistics = new Dictionary<HitResult, int>() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning test
|
|
||
| var date = DateTimeOffset.UtcNow.Date; | ||
|
|
||
| var scoreData = new SoloScoreData() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning test
|
|
||
| var scoreData = new SoloScoreData() | ||
| { | ||
| Statistics = new Dictionary<HitResult, int>() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning test
| [HitResult.Meh] = 0, | ||
| [HitResult.Miss] = 0, | ||
| }, | ||
| MaximumStatistics = new Dictionary<HitResult, int>() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning test
| private const int default_replay_version = 20151228; | ||
|
|
||
| public static Stream WriteReplayWithHeader(byte[] frameData, ushort rulesetId, int? scoreVersion, HighScore score, User user, OsuBeatmap beatmap) | ||
| private static readonly Dictionary<ushort, Ruleset> rulesets = new Dictionary<ushort, Ruleset>() |
Check warning
Code scanning / InspectCode
Redundant empty argument list on object creation expression Warning
Resolves #1.
This creates a new
IReplayCacheabstraction in case the backing store ever changes, but right now there is onlyFileReplayCache.There are new environment variables to set the storage path for the cache. This is intentionally different from the local storage path as they should always be separated but in production the local storage paths are not used anyways. This should allow for block volumes to be used without issue.
This cache works by creating folders for each day to contain cached replays in, with a worker in the background to delete these folders based on age once they hit the maximum specified by the new env var for it.