-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Replace Kemal::StaticFileHandler
with direct subclass of stdlib HTTP::StaticFileHandler
on Crystal >= 1.17.0
#5338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Replace Kemal::StaticFileHandler
with direct subclass of stdlib HTTP::StaticFileHandler
on Crystal >= 1.17.0
#5338
Conversation
end | ||
end | ||
|
||
CACHE_LIMIT = 5_000_000 # 5MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CACHE_LIMIT
is actually way too lenient imo. It actually allows you to fit everything from the assets folder since the entire non-compressed folder with the videojs dependencies is 4.9MB. And if you minified the videojs scripts, its only 2.7MB. Adding gzip compression to that can then get it down to around 840kB.
Adding some simple LRU cache and/or compressing the cached files might be something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would probably only save the gzipped + deflated versions to cache. Even storing the three versions would be less than 15M, which is perfectly reasonnable given the rest of the RAM usage of invidious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compressed responses are created by the FilteredCompressHandler
so I'm not sure how the compressed data can be stored without duplicating a bunch of code and skipping the FilteredCompressHandler
entirely.
Design wise though I was thinking that maybe we could just store only the gzipped data and just decompress and recompress as needed to serve the uncompressed, deflate, and range requests. For most requests it'll just be sending the compressed gzip data and for the rest I don't think the extra (re)compression steps would cause too much of an issue.
# being set to `IO` rather than `File` | ||
# | ||
# Can be removed once https://github.com/crystal-lang/crystal/issues/15817 is fixed. | ||
private def serve_file_range(context : HTTP::Server::Context, file : IO, range_header : String, file_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the size of files we are serving, we probably could let the stdlib code load the file from disk if we ever receive a ranged requests.
end | ||
end | ||
|
||
CACHE_LIMIT = 5_000_000 # 5MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would probably only save the gzipped + deflated versions to cache. Even storing the three versions would be less than 15M, which is perfectly reasonnable given the rest of the RAM usage of invidious.
private def file_info(expanded_path : Path) | ||
file_path = @public_dir.join(expanded_path.to_kind(Path::Kind.native)) | ||
{@@cached_files[file_path]? || File.info?(file_path), file_path} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two questions here:
- Do you think it could be interesting to keep track of the time a file was last read from disk, so that we could reload it if its modification time changed? I'm thinking about users that complained about not being able to make CSS changes without restarting.
- Less important: is there a reason to use the full (system) path instead of the shorter
expanded_path
as the key of thecached_files
Hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well keeping track of when the file was last read to update the cache after N time has passed and the last modified time has changed will still mean that things like CSS changes won't immediately show up till after that initial time to live.
Since we do already track the last modified time of the cached files already, we could just compare it with the actual last modified time on disk for each request. But that's going to cause a lot of disk reads on instances with high request loads. I don't know if that's going to be a nonissue on modern systems but Invidious already has a reputation of killing drives and I'm concerned that this could potentially strain disks further.
Alternatively we could also just forgo the file cache system entirely and let instance maintainers add caching during deployment instead.
As for why the full path is used, the later methods down the chain doesn't have access to the shorter expanded_path
so they won't be able to use it as a key to write to the cache.
Hi! I'm wondering if there's anything blocking these commits? I'm hoping to be able to use the new crystallang version hopefully soon. |
Kemal::StaticFileHandler
with direct subclass of stdlib HTTP::StaticFileHandler
on Crystal < 1.17.0Kemal::StaticFileHandler
with direct subclass of stdlib HTTP::StaticFileHandler
on Crystal >= 1.17.0
The PR is still going through the code review process but after that it can be merged |
This is now needed to build the latest invidious commits with Crystal 1.17.1 🫤 :
|
Kemal's subclass of the stdlib `HTTP::StaticFileHandler` is not as maintained as its parent, and so misses out on many enhancements and bug fixes from upstream, which unfortunately also includes the patches for security vulnerabilities... Though this isn't necessarily Kemal's fault since the bulk of the stdlib handler's logic was done in a single big method, making any changes hard to maintain. This was fixed in Crystal 1.17.0 where the handler was refactored into many private methods, making it easier for an inheriting type to implement custom behaviors while still leveraging much of the pre-existing code. Since we don't actually use any of the Kemal specific features added by `Kemal::StaticFileHandler`, there really isn't a reason to not just create a new handler based upon the stdlib implementation instead which will address the problems mentioned above. This PR implements a new handler which inherits from the stdlib variant and overrides the helper methods added in Crystal 1.17.0 to add the caching behavior with minimal code changes. Since this new handler depends on the code in Crystal 1.17.0, it will only be applied on versions greater than or equal to 1.17.0. On older versions we'll fallback to the current monkey patched `Kemal::StaticFileHandler`
Overriding `#call` or patching out `serve_file_compressed` provides only minimal benefits over the ease of maintenance granted by only overriding what we need to for the caching behavior.
Running `crystal spec` without a file argument essentially produces one big program that combines every single spec file, their imports, and the files that those imports themselves depend on. Most of the types within this combined program will get ignored by the compiler due to a lack of any calls to them from the spec files. But for some types, partially the HTTP module ones, using them within the spec files will suddenly make the compiler enable a bunch of previously ignored code. And those code will suddenly require the presence of additional types, constants, etc. This not only make it annoying for getting the specs working but also makes it difficult to isolate behaviors for testing. The `static_assets_handler_spec.cr` causes this issue and so will be marked as an isolated spec for now. In the future all of the tests should be organized into independent groupings similar to how the Crystal compiler splits their tests into std, compiler, primitives and interpreter.
Summing the sizes of each cached file every time is very inefficient. Instead we can simply store the cache size in an constant and increase it everytime a file is added into the cache.
a49bfa7
to
92888bd
Compare
Oops looks like #5337 is required for compression to work correctly... I just rebased against master which includes that commit and its fixed now. A test case can't be added due to needing to import |
I tested it and works fine now ;). Thanks |
Fixes the CI for Crystal nightly
Kemal's subclass of the stdlib
HTTP::StaticFileHandler
is not asmaintained as its parent, and so misses out on many enhancements and bug fixes from upstream, which unfortunately also includes the patches for security vulnerabilities...
Though this isn't necessarily Kemal's fault since the bulk of the stdlib handler's logic was done in a single big method, making any changes hard to maintain. This was fixed in Crystal 1.17.0 where the handler was refactored into many private methods, making it easier for an inheriting type to implement custom behaviors while still leveraging
much of the pre-existing code.
Since we don't actually use any of the Kemal specific features added by
Kemal::StaticFileHandler
, there really isn't a reason to not just create a new handler based upon the stdlib implementation instead whichwill address the problems mentioned above.
This PR implements a new handler which inherits from the stdlib version and overrides the helper methods added in Crystal 1.17.0 to add the caching behavior with minimal code changes. Since this new handler depends on the code in Crystal 1.17.0, it will only be applied on versions greater than or equal to 1.17.0. On older versions we'll fallback to the current monkey patched
Kemal::StaticFileHandler