Skip to content

fix: compress RNTuple column data#1413

Merged
ariostas merged 4 commits intomainfrom
ariostas/rntuple_writing_compression
Apr 9, 2025
Merged

fix: compress RNTuple column data#1413
ariostas merged 4 commits intomainfrom
ariostas/rntuple_writing_compression

Conversation

@ariostas
Copy link
Copy Markdown
Member

@ariostas ariostas commented Apr 7, 2025

RNTuple data was being stored fully uncompressed, which was pretty wasteful. This small PR makes it so that it is stored compressed with the compression setting from the file. It would be good to have the option of selecting compression settings in a more granular way (i.e. per field or column), but I'll have to think a bit more how to implement this since it would make more sense to specify it per column, but it would be a lot easier to specify it per field.

@ariostas ariostas marked this pull request as ready for review April 7, 2025 18:31
@ariostas ariostas requested a review from ianna April 7, 2025 18:33
Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ariostas - looks good to me. I guess, I don't understand the RNTuple format good enough to tell if specific encoding is what one would expect. Please, merge it when you are finished. Thanks!

),
_rntuple_column_compression_settings_format.pack(
self.compression_settings
col[0][3]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it so specific?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is pretty ugly. It's one of the things I'll have to refactor since right now it's hard to know what it's doing. But in the interest of getting this PR out since it's kind of important, I'll defer this to the big writing refactoring that I'll start doing soon. Thanks, Ianna!

@ariostas ariostas merged commit f10f9ba into main Apr 9, 2025
26 checks passed
@ariostas ariostas deleted the ariostas/rntuple_writing_compression branch April 9, 2025 13:10
@ariostas ariostas mentioned this pull request Apr 21, 2025
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants