Skip to content

Conversation

@tsunyoku
Copy link
Member

@tsunyoku tsunyoku commented Sep 8, 2025

Resolves #1.

This creates a new IReplayCache abstraction in case the backing store ever changes, but right now there is only FileReplayCache.

There are new environment variables to set the storage path for the cache. This is intentionally different from the local storage path as they should always be separated but in production the local storage paths are not used anyways. This should allow for block volumes to be used without issue.

This cache works by creating folders for each day to contain cached replays in, with a worker in the background to delete these folders based on age once they hit the maximum specified by the new env var for it.

@tsunyoku tsunyoku self-assigned this Sep 8, 2025
Copy link
Contributor

@bdach bdach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for leaving this unreviewed for as long as it was. That said I'm not sure you're going to be happy with my review either...

I have concerns with leaning on redis again for the expiry tracking. Especially given this:

The eviction works by setting a dummy key on Redis with an expiry, and then we capture the expiry using Redis' keyspace events in the background. In order for these events to be captured, you will need to run the following command on the indicated Redis instance to capture expire(d) and eviction events:

config set notify-keyspace-events K$xe

as I worry about the performance implications of doing this if this is ran on a redis instance that is not completely dedicated to this service.

@peppy probably have a read through this and see if you agree with my points, since most of them are subjective architectural "feels" rather than concrete criticisms.

@tsunyoku
Copy link
Member Author

tsunyoku commented Oct 17, 2025

As requested, this no longer uses Redis to keep track of the cache. As a result there are no mentions of Redis at all in this project now, although that will change once I PR a solution for #2.

This works by creating subdirectories for each date (at time of upload), with a worker that will constantly check each subdirectory and delete it if it passes the maximum amount of cached days (as per app settings).

tsunyoku and others added 7 commits October 30, 2025 18:46
The bug fix mentioned in
ppy#12 (comment)
was attempted to be covered with tests added in
18309f2, but those test changes *don't
actually cover the failure*, because they exercised the *cached* replay,
which wasn't affected by the lack of seek, instead of the *stored*
replay, which *was* affected by the lack of seek.
Copy link
Contributor

@bdach bdach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably fine.

@tsunyoku please check my latest commits, b021ec7 in particular.

@peppy not sure if you want to have a second pair of eyes on this.

@tsunyoku
Copy link
Member Author

Looks good.

@tsunyoku
Copy link
Member Author

Are we good to merge this? I'm gonna need this in to work on #2.

@peppy
Copy link
Member

peppy commented Dec 10, 2025

Checking on this today.

@peppy peppy self-requested a review December 10, 2025 06:21
/// In the case of the legacy cache folder, replays must be split by ruleset, because stable scores have separate ID schemes per ruleset,
/// so there is another hierarchy level inside with a folder per ruleset.
/// At the lowest level, the cache is divided up by folders of each date.
/// When a replay is added to the cache, it will be put into a folder named by the date it was added in <c>ddMMyy</c> format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a valid reason we're splitting out legacy vs non-legacy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because thats how it works currently. Does web-10 have the guarantee that it will know the non-legacy ID at point of uploading the replay? What about API v1 or legacy scores via API v2 that request the replay using the legacy score ID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also currently the upload flow relies on separating legacy and non-legacy uploads since non-legacy uploads have replay headers attached and non-legacy ones don't. I'm happy to move things to a more unified model if we have the guarantees questioned in my other comment, but it will be an undertaking in itself.

Copy link
Member

@peppy peppy Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the initial upload (if we're actually planning to do that from web-10 and not make it in new code) may have issues. the other uses you mention won't (we maintain a mapping).

also don't expect legacy score tables to be a thing for too much longer. we have short-term plans of nuking it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool. I'll think of a way to unify the replay process for both clients then. It's likely just going to be checking for legacy_score_id on the scores table unless you have a preferred method of doing this (I still need to know if a score is set in stable so I can handle replay headers)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this the best I can in 2ed5846. Please read the commit description for a better understanding of what legacy parts remain.

As far as I'm aware, keeping the storage split between legacy and solo is a non-negotiable given all legacy replays are currently stored in the legacy replay buckets and unifying this so that even legacy replays upload to the new solo bucket would require logic to check all buckets when trying to fetch a replay for a legacy score (since there's the chance it could be in any of the buckets). If we feel strongly about that, I can implement the logic to check each bucket and remove that one remaining legacy distinction but I'm not immediately convinced its worth it.

This still feels a lot better though, since there is now one set of endpoints for all replays

This attempts to unify legacy and solo replays as much as possible. All endpoints now take solo score IDs only. Long-term replay storage (usually S3) still splits between legacy and solo to support existing systems, but all caching is unified. All references to legacy score tables are nuked.
legacy_total_score = 13160096,
ScoreData = new SoloScoreData
{
Statistics = new Dictionary<HitResult, int>()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning test

Empty argument list is redundant

var date = DateTimeOffset.UtcNow.Date;

var scoreData = new SoloScoreData()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning test

Empty argument list is redundant

var scoreData = new SoloScoreData()
{
Statistics = new Dictionary<HitResult, int>()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning test

Empty argument list is redundant
[HitResult.Meh] = 0,
[HitResult.Miss] = 0,
},
MaximumStatistics = new Dictionary<HitResult, int>()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning test

Empty argument list is redundant

var date = DateTimeOffset.UtcNow.Date;

var scoreData = new SoloScoreData()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning test

Empty argument list is redundant

var scoreData = new SoloScoreData()
{
Statistics = new Dictionary<HitResult, int>()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning test

Empty argument list is redundant
[HitResult.Meh] = 0,
[HitResult.Miss] = 0,
},
MaximumStatistics = new Dictionary<HitResult, int>()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning test

Empty argument list is redundant
private const int default_replay_version = 20151228;

public static Stream WriteReplayWithHeader(byte[] frameData, ushort rulesetId, int? scoreVersion, HighScore score, User user, OsuBeatmap beatmap)
private static readonly Dictionary<ushort, Ruleset> rulesets = new Dictionary<ushort, Ruleset>()

Check warning

Code scanning / InspectCode

Redundant empty argument list on object creation expression Warning

Empty argument list is redundant
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

On caching strategy & backing store

3 participants