optimize APPROX_DISTINCT operations on constant conditional values#25428
Merged
Conversation
6 tasks
Contributor
Author
|
And just for clarity, the reason I had to duplicate the PR from #25262 here is explained in a comment here: #25262 (comment)
@kaikalur requested some changes to skip the aggregation entirely, but that's not possible in the case of aggregations over conditionals like this. Will handle that in a follow-up change as discussed here: #25262 (comment) cc @feilong-liu |
feilong-liu
approved these changes
Jun 25, 2025
feilong-liu
left a comment
Contributor
There was a problem hiding this comment.
Unblock since the changes requested by @kaikalur is addressed
…restodb#25262) Summary: Pull Request resolved: prestodb#25262 `APPROX_DISTINCT` operations on a conditional constant value (e.g. `APPROX_DISTINCT(IF(expr, 'abcd'))`) are more expensive than and functionally equivalent to `ARBITRARY(IF(expr, 1, 0))` Adding an optimizer rule to replace any `APPROX_DISTINCT` operations on constant conditional values with equivalent calls to `ARBITRARY` This comes up in some automated queries Differential Revision: D76161617
25cbd02 to
8e75193
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
APPROX_DISTINCToperations on a constant value (e.g.APPROX_DISTINCT(IF(..., const))) are more expensive than and functionally equivalent toARBITRARY(IF(..., 1, 0))Adding an optimizer rule to replace any
APPROX_DISTINCToperations on constant conditionals with equivalent calls toARBITRARYThis is a copy of the PR here: #25262 which had to be recreated due to GitHub weirdness
Motivation and Context
Some autogenerated queries use this pattern, which is inefficient and causes OOM errors for complex queries
Impact
Queries which use this APPROX_DISTINCT pattern will consume less memory
Test Plan
Adding test coverage, E2E and unit tests
Also did some E2E testing manually to make sure the substitution was occurring:
All unit tests were run on latest revision. Also, a verifier run was performed:
failed queries were due to load
Contributor checklist
Release Notes