-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Optimize PagedInputStream::Skip #6699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
This pull request was exported from Phabricator. Differential Revision: D49501856 |
Summary: Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this: 1. Skip decompression of the whole block (frame in case of ZSTD) if 1. We can get the precise decompressed size, and 2. The decompressed size is no larger than the bytes need to skip 2. Accumulate contiguous skip calls to create larger skip region (delayed skipping) 3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips Differential Revision: D49501856
092254c to
d183dbc
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49501856 |
Summary: Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this: 1. Skip decompression of the whole block (frame in case of ZSTD) if 1. We can get the precise decompressed size, and 2. The decompressed size is no larger than the bytes need to skip 2. Accumulate contiguous skip calls to create larger skip region (delayed skipping) 3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips Differential Revision: D49501856
d183dbc to
dfe16d4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49501856 |
Summary: Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this: 1. Skip decompression of the whole block (frame in case of ZSTD) if 1. We can get the precise decompressed size, and 2. The decompressed size is no larger than the bytes need to skip 2. Accumulate contiguous skip calls to create larger skip region (delayed skipping) 3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips Differential Revision: D49501856
dfe16d4 to
3f67f5b
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49501856 |
Summary: Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this: 1. Skip decompression of the whole block (frame in case of ZSTD) if 1. We can get the precise decompressed size, and 2. The decompressed size is no larger than the bytes need to skip 2. Accumulate contiguous skip calls to create larger skip region (delayed skipping) 3. Fix `ByteRleDecoder::skipBytes` to avoid reading data and breaking contiguous skips Differential Revision: D49501856
3f67f5b to
408cab1
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49501856 |
Summary: Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this: 1. Skip decompression of the whole block (frame in case of ZSTD) if 1. We can get the precise decompressed size, and 2. The decompressed size is no larger than the bytes need to skip 2. Accumulate contiguous skip calls to create larger skip region (delayed skipping) Differential Revision: D49501856
Summary: Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this: 1. Skip decompression of the whole block (frame in case of ZSTD) if 1. We can get the precise decompressed size, and 2. The decompressed size is no larger than the bytes need to skip 2. Accumulate contiguous skip calls to create larger skip region (delayed skipping) Differential Revision: D49501856
408cab1 to
834c5c7
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49501856 |
834c5c7 to
f0ea3ac
Compare
|
This pull request was exported from Phabricator. Differential Revision: D49501856 |
|
This pull request has been merged in f6e9b76. |
|
This pull request has been reverted by d08ab02. |
Summary: Pull Request resolved: facebookincubator#6699 Currently when we skip bytes in `PagedInputStream`, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this: 1. Skip decompression of the whole block (frame in case of ZSTD) if 1. We can get the precise decompressed size, and 2. The decompressed size is no larger than the bytes need to skip 2. Accumulate contiguous skip calls to create larger skip region (delayed skipping) Reviewed By: oerling Differential Revision: D49501856 fbshipit-source-id: 07241aaf71e83f0f491050a9be6075dd5500dd52
Summary: ## History Some time ago similar PR was landed (facebookincubator#6699, D49501856) and caused SEV S369242 in Meta. That time `ZSTD_DCtx` context was created and reused on decompressor level. The optimization was reverted due to OOMs. ## Getting back to it again The optimization still makes sense. For example: - In Presto Adhoc we spend 0.07% of CPU cycles in `ZSTD_createDCtx_internal`: https://fburl.com/strobelight/2lt1wn4v - In Presto batch we spend 0.1% of CPU cycles in `ZSTD_createDCtx_internal`: https://fburl.com/strobelight/js4kn8za ## The fix Instead of creating `ZSTD_DCtx` per decompressor, we should create it per thread. Then we will be able to reuse the allocation and don't consume so much memory in FlatMaps. Differential Revision: D89716393
Differential Revision: D49501856
Currently when we skip bytes in
PagedInputStream, we do the decompression unconditionally and it is expensive. Some optimizations are added to address this:ByteRleDecoder::skipBytesto avoid reading data and breaking contiguous skips