Skip to content

Conversation

syeopite
Copy link
Member

@syeopite syeopite commented Jun 4, 2025

Fixes the CI for Crystal nightly


Kemal's subclass of the stdlib HTTP::StaticFileHandler is not as
maintained as its parent, and so misses out on many enhancements and bug fixes from upstream, which unfortunately also includes the patches for security vulnerabilities...

Though this isn't necessarily Kemal's fault since the bulk of the stdlib handler's logic was done in a single big method, making any changes hard to maintain. This was fixed in Crystal 1.17.0 where the handler was refactored into many private methods, making it easier for an inheriting type to implement custom behaviors while still leveraging
much of the pre-existing code.

Since we don't actually use any of the Kemal specific features added by Kemal::StaticFileHandler, there really isn't a reason to not just create a new handler based upon the stdlib implementation instead which
will address the problems mentioned above.

This PR implements a new handler which inherits from the stdlib version and overrides the helper methods added in Crystal 1.17.0 to add the caching behavior with minimal code changes. Since this new handler depends on the code in Crystal 1.17.0, it will only be applied on versions greater than or equal to 1.17.0. On older versions we'll fallback to the current monkey patched Kemal::StaticFileHandler

end
end

CACHE_LIMIT = 5_000_000 # 5MB
Copy link
Member Author

@syeopite syeopite Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CACHE_LIMIT is actually way too lenient imo. It actually allows you to fit everything from the assets folder since the entire non-compressed folder with the videojs dependencies is 4.9MB. And if you minified the videojs scripts, its only 2.7MB. Adding gzip compression to that can then get it down to around 840kB.

Adding some simple LRU cache and/or compressing the cached files might be something to consider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would probably only save the gzipped + deflated versions to cache. Even storing the three versions would be less than 15M, which is perfectly reasonnable given the rest of the RAM usage of invidious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compressed responses are created by the FilteredCompressHandler so I'm not sure how the compressed data can be stored without duplicating a bunch of code and skipping the FilteredCompressHandler entirely.


Design wise though I was thinking that maybe we could just store only the gzipped data and just decompress and recompress as needed to serve the uncompressed, deflate, and range requests. For most requests it'll just be sending the compressed gzip data and for the rest I don't think the extra (re)compression steps would cause too much of an issue.

# being set to `IO` rather than `File`
#
# Can be removed once https://github.com/crystal-lang/crystal/issues/15817 is fixed.
private def serve_file_range(context : HTTP::Server::Context, file : IO, range_header : String, file_info)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the size of files we are serving, we probably could let the stdlib code load the file from disk if we ever receive a ranged requests.

end
end

CACHE_LIMIT = 5_000_000 # 5MB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would probably only save the gzipped + deflated versions to cache. Even storing the three versions would be less than 15M, which is perfectly reasonnable given the rest of the RAM usage of invidious.

Comment on lines +34 to +37
private def file_info(expanded_path : Path)
file_path = @public_dir.join(expanded_path.to_kind(Path::Kind.native))
{@@cached_files[file_path]? || File.info?(file_path), file_path}
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions here:

  1. Do you think it could be interesting to keep track of the time a file was last read from disk, so that we could reload it if its modification time changed? I'm thinking about users that complained about not being able to make CSS changes without restarting.
  2. Less important: is there a reason to use the full (system) path instead of the shorter expanded_path as the key of the cached_files Hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well keeping track of when the file was last read to update the cache after N time has passed and the last modified time has changed will still mean that things like CSS changes won't immediately show up till after that initial time to live.

Since we do already track the last modified time of the cached files already, we could just compare it with the actual last modified time on disk for each request. But that's going to cause a lot of disk reads on instances with high request loads. I don't know if that's going to be a nonissue on modern systems but Invidious already has a reputation of killing drives and I'm concerned that this could potentially strain disks further.

Alternatively we could also just forgo the file cache system entirely and let instance maintainers add caching during deployment instead.


As for why the full path is used, the later methods down the chain doesn't have access to the shorter expanded_path so they won't be able to use it as a key to write to the cache.

@NovaAndrom3da
Copy link
Contributor

Hi! I'm wondering if there's anything blocking these commits? I'm hoping to be able to use the new crystallang version hopefully soon.

@syeopite syeopite changed the title Replace Kemal::StaticFileHandler with direct subclass of stdlib HTTP::StaticFileHandler on Crystal < 1.17.0 Replace Kemal::StaticFileHandler with direct subclass of stdlib HTTP::StaticFileHandler on Crystal >= 1.17.0 Aug 20, 2025
@syeopite
Copy link
Member Author

Hi! I'm wondering if there's anything blocking these commits? I'm hoping to be able to use the new crystallang version hopefully soon.

The PR is still going through the code review process but after that it can be merged

@Fijxu
Copy link
Member

Fijxu commented Sep 8, 2025

This is now needed to build the latest invidious commits with Crystal 1.17.1 🫤 :

In src/ext/kemal_static_file_handler.cr:167:49

 167 | directory_listing(context.response, request_path, file_path)
                                           ^-----------
Error: expected argument #2 to 'Kemal::StaticFileHandler#directory_listing' to be Path, not String

Overloads are:
 - HTTP::StaticFileHandler#directory_listing(io : IO, request_path : Path, path : Path)

@Fijxu
Copy link
Member

Fijxu commented Sep 9, 2025

I just tested this PR and I can't get Invidious interface to load completely, css files are empty and js files do not load:

image

Kemal's subclass of the stdlib `HTTP::StaticFileHandler` is not as
maintained as its parent, and so misses out on many enhancements and bug
fixes from upstream, which unfortunately also includes the patches for
security vulnerabilities...

Though this isn't necessarily Kemal's fault since the bulk of the stdlib
handler's logic was done in a single big method, making any changes hard
to maintain. This was fixed in Crystal 1.17.0 where the handler
was refactored into many private methods, making it easier for an
inheriting type to implement custom behaviors while still leveraging
much of the pre-existing code.

Since we don't actually use any of the Kemal specific features added by
`Kemal::StaticFileHandler`, there really isn't a reason to not just
create a new handler based upon the stdlib implementation instead which
will address the problems mentioned above.

This PR implements a new handler which inherits from the stdlib variant
and overrides the helper methods added in Crystal 1.17.0 to add the
caching behavior with minimal code changes. Since this new handler
depends on the code in Crystal 1.17.0, it will only be applied on
versions greater than or equal to 1.17.0. On older versions we'll
fallback to the current monkey patched `Kemal::StaticFileHandler`
Overriding `#call` or patching out `serve_file_compressed` provides
only minimal benefits over the ease of maintenance granted by only
overriding what we need to for the caching behavior.
Running `crystal spec` without a file argument essentially produces one
big program that combines every single spec file, their imports, and
the files that those imports themselves depend on. Most of the types
within this combined program will get ignored by the compiler due to a
lack of any calls to them from the spec files.

But for some types, partially the HTTP module ones, using them within
the spec files will suddenly make the compiler enable a bunch of
previously ignored code. And those code will suddenly require the
presence of additional types, constants, etc. This not only make it
annoying for getting the specs working but also makes it difficult to
isolate behaviors for testing.

The `static_assets_handler_spec.cr` causes this issue and so will be
marked as an isolated spec for now. In the future all of the tests
should be organized into independent groupings similar to how the
Crystal compiler splits their tests into std, compiler, primitives and
interpreter.
Summing the sizes of each cached file every time is very inefficient.
Instead we can simply store the cache size in an constant and increase
it everytime a file is added into the cache.
@syeopite syeopite force-pushed the fix-static-file-handler-on-1.17.0 branch from a49bfa7 to 92888bd Compare September 11, 2025 03:32
@syeopite
Copy link
Member Author

Oops looks like #5337 is required for compression to work correctly... I just rebased against master which includes that commit and its fixed now.

A test case can't be added due to needing to import handlers.cr which depends on many other Invidious objects and constants... I guess we should separate the handlers there to make designing tests easier

@Fijxu
Copy link
Member

Fijxu commented Sep 11, 2025

I tested it and works fine now ;). Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants