Skip to content

Conversation

@Yuhta
Copy link
Contributor

@Yuhta Yuhta commented Sep 22, 2023

Differential Revision: D49501856

Currently when we skip bytes in PagedInputStream, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this:

  1. Skip decompression of the whole block (frame in case of ZSTD) if
    1. We can get the precise decompressed size, and
    2. The decompressed size is no larger than the bytes need to skip
  2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)
  3. Fix ByteRleDecoder::skipBytes to avoid reading data and breaking contiguous skips

@netlify
Copy link

netlify bot commented Sep 22, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit f0ea3ac
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6512021885007900083f5fd7

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 22, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49501856

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 25, 2023
Summary:

Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive.  Some optimizations are added to address this:
1. Skip decompression of the whole block (frame in case of ZSTD) if
   1. We can get the precise decompressed size, and
   2. The decompressed size is no larger than the bytes need to skip
2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)
3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips

Differential Revision: D49501856
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49501856

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 25, 2023
Summary:

Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive.  Some optimizations are added to address this:
1. Skip decompression of the whole block (frame in case of ZSTD) if
   1. We can get the precise decompressed size, and
   2. The decompressed size is no larger than the bytes need to skip
2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)
3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips

Differential Revision: D49501856
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49501856

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 25, 2023
Summary:

Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive.  Some optimizations are added to address this:
1. Skip decompression of the whole block (frame in case of ZSTD) if
   1. We can get the precise decompressed size, and
   2. The decompressed size is no larger than the bytes need to skip
2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)
3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips

Differential Revision: D49501856
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49501856

Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 25, 2023
Summary:

Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive.  Some optimizations are added to address this:
1. Skip decompression of the whole block (frame in case of ZSTD) if
   1. We can get the precise decompressed size, and
   2. The decompressed size is no larger than the bytes need to skip
2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)
3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips

Differential Revision: D49501856
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49501856

Summary:

Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive.  Some optimizations are added to address this:
1. Skip decompression of the whole block (frame in case of ZSTD) if
   1. We can get the precise decompressed size, and
   2. The decompressed size is no larger than the bytes need to skip
2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)

Differential Revision: D49501856
Yuhta added a commit to Yuhta/velox that referenced this pull request Sep 25, 2023
Summary:

Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive.  Some optimizations are added to address this:
1. Skip decompression of the whole block (frame in case of ZSTD) if
   1. We can get the precise decompressed size, and
   2. The decompressed size is no larger than the bytes need to skip
2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)

Differential Revision: D49501856
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49501856

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D49501856

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f6e9b76.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by d08ab02.

ericyuliu pushed a commit to ericyuliu/velox that referenced this pull request Oct 12, 2023
Summary:
Pull Request resolved: facebookincubator#6699

Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive.  Some optimizations are added to address this:
1. Skip decompression of the whole block (frame in case of ZSTD) if
   1. We can get the precise decompressed size, and
   2. The decompressed size is no larger than the bytes need to skip
2. Accumulate contiguous skip calls to create larger skip region (delayed skipping)

Reviewed By: oerling

Differential Revision: D49501856

fbshipit-source-id: 07241aaf71e83f0f491050a9be6075dd5500dd52
kletkavrubashku added a commit to kletkavrubashku/velox that referenced this pull request Dec 23, 2025
Summary:
## History
Some time ago similar PR was landed (facebookincubator#6699, D49501856) and caused SEV S369242 in Meta. That time `ZSTD_DCtx` context was created and reused on decompressor level. The optimization was reverted due to OOMs.

## Getting back to it again
The optimization still makes sense. For example:
- In Presto Adhoc we spend 0.07% of CPU cycles in `ZSTD_createDCtx_internal`: https://fburl.com/strobelight/2lt1wn4v
- In Presto batch we spend  0.1% of CPU cycles in `ZSTD_createDCtx_internal`: https://fburl.com/strobelight/js4kn8za

## The fix
Instead of creating `ZSTD_DCtx` per decompressor, we should create it per thread. Then we will be able to reuse the allocation and don't consume so much memory in FlatMaps.

Differential Revision: D89716393
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged Reverted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants