perf(dwio): Reuse context in ZSTD_decompress #15854
Open
+4
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
History
Some time ago similar PR was landed (#6699, D49501856) and caused SEV S369242 in Meta. That time
ZSTD_DCtxcontext was created and reused on decompressor level. The optimization was reverted due to OOMs.Getting back to it again
The optimization still makes sense. For example:
ZSTD_createDCtx_internal: https://fburl.com/strobelight/2lt1wn4vZSTD_createDCtx_internal: https://fburl.com/strobelight/js4kn8zaThe fix
Instead of creating
ZSTD_DCtxper decompressor, we should create it per thread. Then we will be able to reuse the allocation and don't consume so much memory in FlatMaps.Test plan
I did multiple experiments. The biggest one runs 100k shadow queries on the same cluster with 2 packages: https://fburl.com/scuba/presto_queries/ddi8pxmd
Same upstream revision for prod and test builds. Only this diff on top.
Here is what I got:
https://pxl.cl/8Dsk1
Execution time is 9% better. The data is not probably really accurate, but at least it's better. I appreciate suggestions here.
The memory usage didn't grow: https://pxl.cl/8Dsjn
Differential Revision: D89716393