-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Enable skiplist based collection using collectRange #20268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ankit Jain <[email protected]>
|
❌ Gradle check result for 3edda3b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
WalkthroughIntroduces a bucket-aware range-collection API: Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java (1)
157-179: Fix critical logic errors in bucket-aware range collection.Two critical issues in the
collectRangeimplementation:
- Line 162: Loop condition uses
maximum(the stats variable tracking max value) instead ofmax(the doc ID range parameter). This causes incorrect iteration bounds.- Line 165: Count increment hardcoded to bucket
0instead of using thebucketparameter, breaking per-bucket accounting.Apply this diff to fix both issues:
public void collectRange(int min, int max, long bucket) throws IOException { growStats(bucket); double minimum = mins.get(bucket); double maximum = maxes.get(bucket); - for (int doc = min; doc < maximum; doc++) { + for (int doc = min; doc < max; doc++) { if (values.advanceExact(doc)) { final int valuesCount = values.docValueCount(); - counts.increment(0, valuesCount); + counts.increment(bucket, valuesCount); for (int i = 0; i < valuesCount; i++) { double value = values.nextValue(); kahanSummation.add(value); minimum = Math.min(minimum, value); maximum = Math.max(maximum, value); } } } sums.set(bucket, kahanSummation.value()); compensations.set(bucket, kahanSummation.delta()); mins.set(bucket, minimum); maxes.set(bucket, maximum); }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
modules/aggs-matrix-stats/src/main/java/org/opensearch/search/aggregations/matrix/stats/MatrixStatsAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java(3 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/filter/FilterAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/filter/FiltersAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java(2 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java(2 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java(7 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/terms/LongRareTermsAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/bucket/terms/NumericTermsAggregator.java(2 hunks)server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java(2 hunks)server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java(1 hunks)server/src/main/java/org/opensearch/search/aggregations/metrics/StatsAggregator.java(2 hunks)server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java(1 hunks)
🔇 Additional comments (24)
server/src/main/java/org/opensearch/search/aggregations/metrics/MaxAggregator.java (1)
184-193: LGTM - Per-bucket range collection correctly implemented.The
collectRangemethod now properly accepts abucketparameter and uses it consistently forgrowMaxes(), reading the initial max, and storing the final result. This aligns with the bucket-aware collection pattern introduced across the codebase.server/src/main/java/org/opensearch/search/aggregations/metrics/MinAggregator.java (1)
184-193: LGTM - Per-bucket range collection correctly implemented.The implementation mirrors the MaxAggregator pattern, correctly using the
bucketparameter forgrowMins(), reading the initial min, and storing the final result.server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregator.java (2)
457-459: LGTM - Correct delegation pattern for bucket-aware range collection.The
FromSingleinner class properly overridescollectRangewith the bucket parameter and delegates tosuper.collectRange(min, max, bucket), enabling monomorphic call optimization.
751-753: LGTM - Consistent with FromSingle implementation.The
FromManyinner class follows the same delegation pattern, maintaining consistency across both implementations.server/src/main/java/org/opensearch/search/aggregations/LeafBucketCollector.java (2)
134-137: LGTM - Good backward-compatible convenience overload.The zero-argument bucket overload delegates to
collectRange(min, max, 0), maintaining backward compatibility for callers that don't need bucket-specific collection.
151-158: LGTM - Well-designed experimental API.The new
collectRange(int min, int max, long bucket)method is appropriately marked with@ExperimentalApi. The default implementation correctly passes thebucketparameter tocollect(docId, bucket), and the Javadoc clearly documents the expected behavior. The comment about overriding for monomorphic calls is helpful for implementers.server/src/main/java/org/opensearch/search/aggregations/bucket/composite/CompositeAggregator.java (3)
647-649: LGTM - Proper bucket propagation in getLeafCollector.The anonymous
LeafBucketCollectorcorrectly delegates tosuper.collectRange(min, max, bucket).
686-688: LGTM - Proper bucket propagation in getFirstPassCollector.Consistent with the other collectors in this file.
755-757: LGTM - Proper bucket propagation in getSecondPassCollector.All three anonymous
LeafBucketCollectorimplementations in this file now consistently overridecollectRangewith the bucket parameter and delegate appropriately.server/src/main/java/org/opensearch/search/aggregations/bucket/terms/LongRareTermsAggregator.java (1)
126-129: LGTM!The
collectRangemethod signature is correctly updated to include thelong bucketparameter and properly delegates tosuper.collectRange(min, max, bucket), aligning with the bucket-aware range collection API.server/src/main/java/org/opensearch/search/aggregations/bucket/terms/NumericTermsAggregator.java (2)
167-170: LGTM!The
collectRangemethod is correctly updated to accept and forward thelong bucketparameter to the superclass implementation.
735-738: LGTM!The
collectRangein the wrapped collector correctly propagates the bucket parameter, maintaining consistency with the bucket-aware range collection pattern.server/src/main/java/org/opensearch/search/aggregations/metrics/ExtendedStatsAggregator.java (1)
172-175: LGTM!The
collectRangecorrectly accepts and forwards thelong bucketparameter. Since the superclass implementation will iterate over the range and callcollect(doc, bucket), the per-bucket statistics (counts, sums, mins, maxes, etc.) will be correctly updated for the specified bucket.server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/NumericHistogramAggregator.java (1)
147-150: LGTM!The
collectRangemethod correctly accepts thelong bucketparameter and delegates to the superclass implementation, consistent with the bucket-aware range collection pattern across histogram aggregators.server/src/main/java/org/opensearch/search/aggregations/bucket/HistogramSkiplistLeafCollector.java (1)
216-246: Well-implemented bucket-aware range collection with skiplist optimization.The implementation correctly:
- Advances the skipper if needed before processing the range
- Caps
upToExclusiveatmaxto avoid over-collecting- Uses the optimized path when all docs in the range map to the same bucket (
upToSameBucket), incrementing doc count in bulk and propagating range collection to sub-collectors- Falls back to per-doc collection otherwise
- Correctly uses
upToBucketIndexfor sub-collector calls in the same-bucket path (the actual histogram bucket ordinal) while using the originalbucketparameter for the fallback pathserver/src/main/java/org/opensearch/search/aggregations/bucket/filter/FilterAggregator.java (1)
92-95: LGTM!The
collectRangeoverride correctly propagates thebucketparameter tosuper.collectRange(min, max, bucket), aligning with the bucket-aware range collection pattern introduced across the codebase.server/src/main/java/org/opensearch/search/aggregations/bucket/filter/FiltersAggregator.java (1)
201-204: LGTM!The
collectRangeoverride correctly propagates thebucketparameter tosuper.collectRange(min, max, bucket), consistent with the bucket-aware range collection pattern.server/src/main/java/org/opensearch/search/aggregations/bucket/range/RangeAggregator.java (1)
367-370: LGTM!The
collectRangeoverride correctly propagates thebucketparameter to the superclass, enabling bucket-aware range collection for the range aggregator.server/src/main/java/org/opensearch/search/aggregations/bucket/histogram/DateHistogramAggregator.java (2)
257-260: LGTM!The
collectRangeoverride in the single-valued collector correctly propagates thebucketparameter.
290-293: LGTM!The
collectRangeoverride in the multi-valued collector correctly propagates thebucketparameter, consistent with the single-valued path.modules/aggs-matrix-stats/src/main/java/org/opensearch/search/aggregations/matrix/stats/MatrixStatsAggregator.java (1)
123-126: LGTM!The
collectRangeoverride correctly propagates thebucketparameter, enabling bucket-aware range collection for the matrix stats aggregator.server/src/main/java/org/opensearch/search/aggregations/metrics/AvgAggregator.java (1)
168-184: LGTM!The
collectRangeimplementation correctly uses thebucketparameter for all per-bucket state operations:
setKahanSummation(bucket)initializes arrays and Kahan summation state for the specific bucketcounts.increment(bucket, count)updates the correct bucket's countsums.set(bucket, ...)andcompensations.set(bucket, ...)store results in the correct bucketThis is consistent with the
collect(int doc, long bucket)andcollect(DocIdStream, long bucket)methods in the same class.server/src/main/java/org/opensearch/search/aggregations/metrics/SumAggregator.java (1)
152-163: LGTM! Bucket-aware range collection implemented correctly.The
collectRangemethod now properly accepts and uses thebucketparameter for per-bucket state management. Kahan summation is correctly initialized from the bucket's existing state, and results are stored back to the correct bucket index.server/src/main/java/org/opensearch/search/aggregations/bucket/terms/GlobalOrdinalsStringTermsAggregator.java (1)
284-286: LGTM! Bucket parameter properly propagated through collector hierarchy.All
collectRangeoverrides correctly forward thebucketparameter to their respective superclass implementations. This systematic propagation ensures the bucket context is maintained throughout the collection chain for both single-valued and multi-valued ordinal paths, as well as the LowCardinality and SignificantTermsResults variants.Also applies to: 308-310, 338-340, 365-367, 603-605, 630-632, 1246-1248
Signed-off-by: Ankit Jain <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CHANGELOG.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[grammar] ~15-~15: Use a hyphen to join words.
Context: ...penSearch/pull/20241)) - Enable skiplist based collection using collectRange ([#2...
(QB_NEW_EN_HYPHEN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: detect-breaking-change
| - Handle custom metadata files in subdirectory-store ([#20157](https://github.com/opensearch-project/OpenSearch/pull/20157)) | ||
| - Add support for missing proto fields in GRPC FunctionScore and Highlight ([#20169](https://github.com/opensearch-project/OpenSearch/pull/20169)) | ||
| - Ensure all modules are included in INTEG_TEST testcluster distribution ([#20241](https://github.com/opensearch-project/OpenSearch/pull/20241)) | ||
| - Enable skiplist based collection using collectRange ([#20268](https://github.com/opensearch-project/OpenSearch/pull/20268)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add hyphen to compound adjective.
Per the static analysis hint, compound adjectives should be hyphenated when they precede a noun: "skiplist-based collection" instead of "skiplist based collection".
Apply this diff:
-- Enable skiplist based collection using collectRange ([#20268](https://github.com/opensearch-project/OpenSearch/pull/20268))
+- Enable skiplist-based collection using collectRange ([#20268](https://github.com/opensearch-project/OpenSearch/pull/20268))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Enable skiplist based collection using collectRange ([#20268](https://github.com/opensearch-project/OpenSearch/pull/20268)) | |
| - Enable skiplist-based collection using collectRange ([#20268](https://github.com/opensearch-project/OpenSearch/pull/20268)) |
🧰 Tools
🪛 LanguageTool
[grammar] ~15-~15: Use a hyphen to join words.
Context: ...penSearch/pull/20241)) - Enable skiplist based collection using collectRange ([#2...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
In CHANGELOG.md around line 15, the compound adjective "skiplist based
collection" needs a hyphen; change it to "skiplist-based collection" while
preserving the rest of the line (including the PR link and parentheses) exactly
as-is.
|
{"run-benchmark-test": "id_3"} |
|
❌ Gradle check result for e9d3c28: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Flaky test failure: Retrying gradle check |
|
❕ Gradle check result for e9d3c28: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #20268 +/- ##
============================================
+ Coverage 73.20% 73.25% +0.05%
- Complexity 71745 71786 +41
============================================
Files 5795 5795
Lines 328304 328314 +10
Branches 47283 47284 +1
============================================
+ Hits 240334 240522 +188
+ Misses 68663 68505 -158
+ Partials 19307 19287 -20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR allows skiplist based collection using
collectRangeand allows even sub aggregation to take advantage ofcollectRangeRelated Issues
Related #20031
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Bug Fixes
Performance
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.