-
Notifications
You must be signed in to change notification settings - Fork 78
Optimize multiply operation by avoiding unnecessary validity vector allocation #3974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Greptile Summary
Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant multiply
participant dispatch_multiply
participant multiply_impl
participant GPU
Caller->>multiply: "multiply(left, right, ansi_mode, try_mode)"
multiply->>dispatch_multiply: "dispatch with type and check_overflow flag"
dispatch_multiply->>dispatch_multiply: "Check if both inputs have no nulls"
alt Both inputs valid AND no overflow check
dispatch_multiply->>multiply_impl: "multiply_impl(both_inputs_valid=true)"
multiply_impl->>multiply_impl: "Skip validity vector allocation"
multiply_impl->>GPU: "Launch multiply_no_validity_fn kernel"
GPU-->>multiply_impl: "Computed results"
multiply_impl-->>dispatch_multiply: "Return column with no null mask"
else Need validity tracking
dispatch_multiply->>multiply_impl: "multiply_impl(both_inputs_valid=false)"
multiply_impl->>multiply_impl: "Allocate validity vector"
multiply_impl->>GPU: "Launch multiply_fn kernel with validity tracking"
GPU-->>multiply_impl: "Computed results and validity"
multiply_impl->>multiply_impl: "Convert validity to null mask"
multiply_impl-->>dispatch_multiply: "Return column with null mask"
end
dispatch_multiply-->>multiply: "Return result column"
multiply-->>Caller: "Return result"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
revans2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me. My main concern is that this is going to disappear for spark 4.0+ when ANSI is enabled by default and we always have to check for overflow (except for floating point multiply).
Could you file a follow on issue for us to explore what to do in a case like that? Is there a fast kernel that we can run to see if any overflow would happen first and then decide on allocating the validity buffer or not.
abellina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
Co-authored-by: Nghia Truong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Edit Code Review Agent Settings | Greptile
React with 👍 or 👎 to share your feedback on this new summary format
follow up issue at #3982 |
|
build |
|
Let's take a step back and jump out the scope of merely "multiply operator". I remember that we previously observed in ClickHouse that having every column in the input schema as However, it seems like our code doesn't handle the |
GaryShen2008
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approve again after a small format change.
There are two levels of NOT NULL here that we need to think about/deal with. There is the Spark level of On the GPU we tend to react differently. Each operation/algorithm is responsible for determining if it should allocate a validity buffer or not. Most of the time they do the right thing and allocate it properly based on an actual null_count. At times we can know up front if we even need to calculate this, like with multiply. But it is not perfect in all cases. It is probably worth doing an audit, possibly with AI, to validate that it is all optimized. |
This PR fixes #3973 , and the result nsys loos like:
You can see the yellow bar part reduces significantly.
This improvement reduces our workload e2e time by 10%.