Skip to content

Conversation

@macvincent
Copy link
Contributor

Summary: When implementing the stream chunker, we anticipated that stream buffers after chunking will end up growing to the size that previously triggered chunking. As a tradeoff between minimizing reallocations (for performance) and actually releasing memory (to relieve memory pressure), we heuristically determine the new buffer capacity for each stream. The issue with this optimization is that it conflicts with the rest of our memory tracking logic since we now have retained memory in the memory pool that is not accounted for. We now through local testing that disabling this optimization leads to better memory pressure relief. In this diff, we introduce a JK, dwio/nimble_chunking:disable_memory_reallocation_optimization, that will be enabled just for DISCO experiments. This will help us understand the full impact of this optimization and whether it should be retained.

Differential Revision: D87494427

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 20, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 20, 2025

@macvincent has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87494427.

macvincent added a commit to macvincent/nimble that referenced this pull request Nov 20, 2025
…ubator#325)

Summary:

When implementing the stream chunker, we anticipated that stream buffers after chunking will end up growing to the size that previously triggered chunking. As a tradeoff between minimizing reallocations (for performance) and actually releasing memory (to relieve memory pressure), we heuristically determine the new buffer capacity for each stream to be larger that required. The issue with this optimization is that it conflicts with the rest of our memory tracking logic since we now have retained memory in the memory pool that is not accounted for. We now know through local testing that disabling this optimization leads to better memory pressure relief. In this diff, we introduce a JK, [dwio/nimble_chunking:disable_memory_reallocation_optimization](https://www.internalfb.com/intern/justknobs/?name=dwio%2Fnimble_chunking#disable_memory_reallocation_optimization), that will be enabled just for DISCO experiments. This will help us understand the full impact of this optimization and whether it should be retained.

Differential Revision: D87494427
…ubator#325)

Summary:

When implementing the stream chunker, we anticipated that stream buffers after chunking will end up growing to the size that previously triggered chunking. As a tradeoff between minimizing reallocations (for performance) and actually releasing memory (to relieve memory pressure), we heuristically determine the new buffer capacity for each stream to be larger that required. The issue with this optimization is that it conflicts with the rest of our memory tracking logic since we now have retained memory in the memory pool that is not accounted for. We now know through local testing that disabling this optimization leads to better memory pressure relief. In this diff, we introduce a JK, [dwio/nimble_chunking:disable_memory_reallocation_optimization](https://www.internalfb.com/intern/justknobs/?name=dwio%2Fnimble_chunking#disable_memory_reallocation_optimization), that will be enabled just for DISCO experiments. This will help us understand the full impact of this optimization and whether it should be retained.

Differential Revision: D87494427
macvincent added a commit to macvincent/nimble that referenced this pull request Nov 20, 2025
…ubator#325)

Summary:

When implementing the stream chunker, we anticipated that stream buffers after chunking will end up growing to the size that previously triggered chunking. As a tradeoff between minimizing reallocations (for performance) and actually releasing memory (to relieve memory pressure), we heuristically determine the new buffer capacity for each stream to be larger that required. The issue with this optimization is that it conflicts with the rest of our memory tracking logic since we now have retained memory in the memory pool that is not accounted for. We now know through local testing that disabling this optimization leads to better memory pressure relief. 

We performed local DISCO tests with and without this optimization for two tables and included the comparison in https://docs.google.com/spreadsheets/d/1kZvBwhVHZRyB7tg-qT2Et4V_7mDWyNr-VrtDzjG_FKY/view?gid=1209014630#gid=1209014630. It shows that for these two tables we save an average of **15%** improvement in average write memory. We also see **4%** write CPU improvement on average. Though we saw **-6%** regression in CPU for a single 256MB stripe experiment. These results indicate that more testing is required in order to understand its impact on a larger sample size.


In this diff, we introduce a JK, [dwio/nimble_chunking:disable_memory_reallocation_optimization](https://www.internalfb.com/intern/justknobs/?name=dwio%2Fnimble_chunking#disable_memory_reallocation_optimization), that will be enabled just for DISCO experiments. This will help us understand the full impact of this optimization and whether it should be retained.

Differential Revision: D87494427
macvincent added a commit to macvincent/nimble that referenced this pull request Nov 20, 2025
…ubator#325)

Summary:

When implementing the stream chunker, we anticipated that stream buffers after chunking will end up growing to the size that previously triggered chunking. As a tradeoff between minimizing reallocations (for performance) and actually releasing memory (to relieve memory pressure), we heuristically determine the new buffer capacity for each stream to be larger that required. The issue with this optimization is that it conflicts with the rest of our memory tracking logic since we now have retained memory in the memory pool that is not accounted for. We now know through local testing that disabling this optimization leads to better memory pressure relief. 

We performed local DISCO tests with and without this optimization for two tables and included the comparison in https://docs.google.com/spreadsheets/d/1kZvBwhVHZRyB7tg-qT2Et4V_7mDWyNr-VrtDzjG_FKY/view?gid=1209014630#gid=1209014630. It shows that for these two tables we save an average of **15%** improvement in average write memory. We also see **4%** write CPU improvement on average. Though we saw **-6%** regression in CPU for a single 256MB stripe experiment. These results indicate that more testing is required in order to understand its impact on a larger sample size.


In this diff, we introduce a JK, [dwio/nimble_chunking:disable_memory_reallocation_optimization](https://www.internalfb.com/intern/justknobs/?name=dwio%2Fnimble_chunking#disable_memory_reallocation_optimization), that will be enabled just for DISCO experiments. This will help us understand the full impact of this optimization and whether it should be retained.

Differential Revision: D87494427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant