Skip to content

Conversation

@helfman
Copy link
Contributor

@helfman helfman commented Dec 4, 2025

Summary: Needed for latest rename in Velox

Differential Revision: D88317373

Summary:
Introducing stream level deduplication.

With this change, before writing streams to a stripe, we are identifying streams that are byte-identical and only write one copy of the stream to the file.
All other identical streams will store the same offset and size as the original stream.

To opt-in for this behavior, a new option was introduced: streamDeduplicationEnabled.

C++ reader was modified to identify if streams are duplicated and only load them once.
Selective reader seems to already handle this read deduplication.
Java was not modified. It is loading the same stream multiple times (which is not a regression over what we currently have).

Differential Revision: D88087515
Summary:
Hooking up stream dedup to dwio file writer.

Also added a kill switch just knob: dwio/nimble:enable_stream_deduplication

Differential Revision: D88087517
Summary: Needed for latest rename in Velox

Differential Revision: D88317373
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 4, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 4, 2025

@helfman has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88317373.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant