Use file storage as cache backing store #12

tsunyoku · 2025-09-08T22:26:13Z

Resolves #1.

This creates a new IReplayCache abstraction in case the backing store ever changes, but right now there is only FileReplayCache.

There are new environment variables to set the storage path for the cache. This is intentionally different from the local storage path as they should always be separated but in production the local storage paths are not used anyways. This should allow for block volumes to be used without issue.

This cache works by creating folders for each day to contain cached replays in, with a worker in the background to delete these folders based on age once they hit the maximum specified by the new env var for it.

bdach

Sorry for leaving this unreviewed for as long as it was. That said I'm not sure you're going to be happy with my review either...

I have concerns with leaning on redis again for the expiry tracking. Especially given this:

The eviction works by setting a dummy key on Redis with an expiry, and then we capture the expiry using Redis' keyspace events in the background. In order for these events to be captured, you will need to run the following command on the indicated Redis instance to capture expire(d) and eviction events:
config set notify-keyspace-events K$xe

as I worry about the performance implications of doing this if this is ran on a redis instance that is not completely dedicated to this service.

@peppy probably have a read through this and see if you agree with my points, since most of them are subjective architectural "feels" rather than concrete criticisms.

osu.Server.ReplayStore/Services/FileReplayCache.cs

osu.Server.ReplayStore/ExpireReplayCacheWorker.cs

osu.Server.ReplayStore/ReplayStoreController.cs

Required to ensure cache folders are unique per test-run

tsunyoku · 2025-10-17T21:06:54Z

As requested, this no longer uses Redis to keep track of the cache. As a result there are no mentions of Redis at all in this project now, although that will change once I PR a solution for #2.

This works by creating subdirectories for each date (at time of upload), with a worker that will constantly check each subdirectory and delete it if it passes the maximum amount of cached days (as per app settings).

osu.Server.ReplayStore/ReplayStoreController.cs

osu.Server.ReplayStore/Services/FileReplayCache.cs

osu.Server.ReplayStore/ExpireReplayCacheWorker.cs

…app setting

The bug fix mentioned in ppy#12 (comment) was attempted to be covered with tests added in 18309f2, but those test changes *don't actually cover the failure*, because they exercised the *cached* replay, which wasn't affected by the lack of seek, instead of the *stored* replay, which *was* affected by the lack of seek.

bdach

Probably fine.

@tsunyoku please check my latest commits, b021ec7 in particular.

@peppy not sure if you want to have a second pair of eyes on this.

tsunyoku · 2025-10-31T08:47:00Z

Looks good.

tsunyoku · 2025-11-27T13:23:29Z

Are we good to merge this? I'm gonna need this in to work on #2.

peppy · 2025-12-10T06:21:13Z

Checking on this today.

peppy · 2025-12-10T08:30:56Z

osu.Server.ReplayStore/Services/FileReplayCache.cs

+    /// In the case of the legacy cache folder, replays must be split by ruleset, because stable scores have separate ID schemes per ruleset,
+    /// so there is another hierarchy level inside with a folder per ruleset.
+    /// At the lowest level, the cache is divided up by folders of each date.
+    /// When a replay is added to the cache, it will be put into a folder named by the date it was added in <c>ddMMyy</c> format.


Is there a valid reason we're splitting out legacy vs non-legacy?

Mostly because thats how it works currently. Does web-10 have the guarantee that it will know the non-legacy ID at point of uploading the replay? What about API v1 or legacy scores via API v2 that request the replay using the legacy score ID?

Also currently the upload flow relies on separating legacy and non-legacy uploads since non-legacy uploads have replay headers attached and non-legacy ones don't. I'm happy to move things to a more unified model if we have the guarantees questioned in my other comment, but it will be an undertaking in itself.

maybe the initial upload (if we're actually planning to do that from web-10 and not make it in new code) may have issues. the other uses you mention won't (we maintain a mapping).

also don't expect legacy score tables to be a thing for too much longer. we have short-term plans of nuking it.

Okay cool. I'll think of a way to unify the replay process for both clients then. It's likely just going to be checking for legacy_score_id on the scores table unless you have a preferred method of doing this (I still need to know if a score is set in stable so I can handle replay headers)

I've done this the best I can in 2ed5846. Please read the commit description for a better understanding of what legacy parts remain.

As far as I'm aware, keeping the storage split between legacy and solo is a non-negotiable given all legacy replays are currently stored in the legacy replay buckets and unifying this so that even legacy replays upload to the new solo bucket would require logic to check all buckets when trying to fetch a replay for a legacy score (since there's the chance it could be in any of the buckets). If we feel strongly about that, I can implement the logic to check each bucket and remove that one remaining legacy distinction but I'm not immediately convinced its worth it.

This still feels a lot better though, since there is now one set of endpoints for all replays

osu.Server.ReplayStore/Services/FileReplayCache.cs

osu.Server.ReplayStore/ExpireReplayCacheWorker.cs

This attempts to unify legacy and solo replays as much as possible. All endpoints now take solo score IDs only. Long-term replay storage (usually S3) still splits between legacy and solo to support existing systems, but all caching is unified. All references to legacy score tables are nuked.

osu.Server.ReplayStore.Tests/LegacyReplayHelperTest.cs

+                legacy_total_score = 13160096,
+                ScoreData = new SoloScoreData
+                {
+                    Statistics = new Dictionary<HitResult, int>()