-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix: respect inexact flags in row group metadata #16412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: respect inexact flags in row group metadata #16412
Conversation
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
27eeff3
to
27b4595
Compare
1eaac41
to
c43f1de
Compare
Hi @alamb, this pr tried to extract the exactness flags in row group metadata, could you please take a look :) |
c43f1de
to
bf10479
Compare
/// The value `0` appears at indices `[0, 2, 4]`. The corresponding exactness | ||
/// values are `[true, false, false]`. Since at least one is `true`, the | ||
/// function returns `Some(true)`. | ||
fn has_any_exact_match( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated a unit test with 4 possible scenarios. Also use a struct to make clippy happy, PTAL :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is a good finding and nice fix!
Thank you @xudong963 and @CookiePieWw |
Which issue does this PR close?
Rationale for this change
Currently, datafusion will treat all max and min values in column stats as exact, while some of them may be inexact.
What changes are included in this PR?
For each row group, when max or min value is calculated, retrieve its corresponding exactness flag. The final max or min value's exactness represents the final exactness flag. Wrap the max and min stats with
Inexact
orExact
based on the final exactness flagAre these changes tested?
Are there any user-facing changes?
Now datafusion will correctly report the exactness of column max and min values.