fix: Add memory management for approx_most_frequent addSingleGroupRawInput #15852

duxiao1212 · 2025-12-23T21:51:55Z

Summary:
Add memory management to approx_most_frequent aggregate function to prevent OOM during global aggregation, where spilling is not effective.

Changes:

Modified ApproxMostFrequentStreamSummary::insert() to return eviction info (evicted value and count)
Added memory tracking (liveBytes_, deadBytes_) for StringView accumulator
Added rebuild() to compact string storage when dead bytes exceed threshold
Added per-key size limit to prevent individual oversized keys

Motivation:
For queries like:
SELECT APPROX_MOST_FREQUENT(20, CAST(feature AS JSON), 1000), ...
FROM table LIMIT 10000000
With 10M input rows and ~82KB per row, without memory management:
20251215_232949_18211_b5kzi

The HashStringAllocator accumulates string storage for all inserted keys
When keys are evicted from the Space-Saving summary, their string storage becomes "dead" but is never freed
Memory grows unbounded until OOM
This fix addresses:

Single large keys - Per-key size limit (kMaxTotalStringBytes / capacity) prevents any individual key from consuming excessive memory
Dead memory accumulation - rebuild() triggers when deadBytes > kMaxTotalStringBytes * 0.25, compacting storage by copying only live keys to new allocation and freeing the old storage

Differential Revision: D89728171

netlify · 2025-12-23T21:52:00Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`3a0bb48`
🔍 Latest deploy log	https://app.netlify.com/projects/meta-velox/deploys/694b11861311c0000821b498

meta-codesync · 2025-12-23T21:52:02Z

@duxiao1212 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89728171.

…Input (facebookincubator#15852) Summary: Add memory management to approx_most_frequent aggregate function to prevent OOM during global aggregation, where spilling is not effective. Changes: 1. Modified ApproxMostFrequentStreamSummary::insert() to return eviction info (evicted value and count) 2. Added memory tracking (liveBytes_, deadBytes_) for StringView accumulator 3. Added rebuild() to compact string storage when dead bytes exceed threshold 4. Added per-key size limit to prevent individual oversized keys Motivation: For queries like: SELECT APPROX_MOST_FREQUENT(20, CAST(feature AS JSON), 1000), ... FROM table LIMIT 10000000 With 10M input rows and ~82KB per row, without memory management: **20251215_232949_18211_b5kzi** The HashStringAllocator accumulates string storage for all inserted keys When keys are evicted from the Space-Saving summary, their string storage becomes "dead" but is never freed Memory grows unbounded until OOM This fix addresses: Single large keys - Per-key size limit (kMaxTotalStringBytes / capacity) prevents any individual key from consuming excessive memory Dead memory accumulation - rebuild() triggers when deadBytes > kMaxTotalStringBytes * 0.25, compacting storage by copying only live keys to new allocation and freeing the old storage Differential Revision: D89728171

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 23, 2025

meta-codesync bot added fb-exported meta-exported labels Dec 23, 2025

duxiao1212 requested review from Yuhta and xiaoxmeng December 23, 2025 21:52

duxiao1212 marked this pull request as draft December 23, 2025 21:53

duxiao1212 marked this pull request as ready for review December 23, 2025 22:00

duxiao1212 force-pushed the export-D89728171 branch from 807ffad to 3a0bb48 Compare December 23, 2025 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add memory management for approx_most_frequent addSingleGroupRawInput #15852

fix: Add memory management for approx_most_frequent addSingleGroupRawInput #15852

duxiao1212 commented Dec 23, 2025

Uh oh!

netlify bot commented Dec 23, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Add memory management for approx_most_frequent addSingleGroupRawInput #15852

Are you sure you want to change the base?

fix: Add memory management for approx_most_frequent addSingleGroupRawInput #15852

Conversation

duxiao1212 commented Dec 23, 2025

Uh oh!

netlify bot commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for meta-velox canceled.

Uh oh!

meta-codesync bot commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

netlify bot commented Dec 23, 2025 •

edited

Loading