-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-35351][SQL] Add code-gen for left anti sort merge join #32547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @cloud-fan and @maropu to take a look when you have time, thanks. |
@@ -693,15 +701,12 @@ case class SortMergeJoinExec( | |||
| continue; | |||
| } | |||
|} | |||
|if (!$loaded) { | |||
| $loaded = true; | |||
| $streamedAfter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For left anti, streamedAfter
will appear twice in the code. How large is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - good call for code size. Actually I just figured we don't need streamedAfter
here for left anti, because we can skip streamed row when there's a match here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - updated. Also avoid unnecessary code for bufferedAfter
and LEFT SEMI join.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138547 has finished for PR 32547 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #138554 has finished for PR 32547 at commit
|
retest this please |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Test build #138566 has finished for PR 32547 at commit
|
|while ($streamedInput.hasNext()) { | ||
| $findNextJoinRows; | ||
| $beforeLoop | ||
| boolean $hasOutputRow = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: hasMatchedRow
? Has matched row means no output row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan - yeah it should be hasMatchedRow
, updated, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can you fix the code conflicts?
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #138646 has finished for PR 32547 at commit
|
Thank you, @c21 ~ Merged to master. |
Thank you @maropu and @cloud-fan for review! |
// Evaluate the columns those used by condition before loop | ||
val before = | ||
s""" | ||
|boolean $loaded = false; | ||
|$streamedBefore | ||
""".stripMargin | ||
|
||
val loadStreamed = | ||
s""" | ||
|if (!$loaded) { | ||
| $loaded = true; | ||
| $streamedAfter | ||
|} | ||
""".stripMargin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems loaded
is not needed for LeftAnti
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadStreamed
is not used by LeftAnti
.
I think you are referring to boolean $loaded = false;
in before
should not be needed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, looks like for LeftAnti
, it doesn't rely on loaded
to do streamedAfter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #32681 as followup, thanks.
…NTI SMJ code-gen ### What changes were proposed in this pull request? This is a followup from #32547 (comment), where for LEFT ANTI join, we do not need to depend on `loaded` variable, as in `codegenAnti` we only load `streamedAfter` no more than once (i.e. assign column values from streamed row which are not used in join condition). ### Why are the changes needed? Avoid unnecessary processing in code-gen (though it's just `boolean $loaded = false;`, and `if (!$loaded) { $loaded = true; }`). ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unite tests in `ExistenceJoinSuite`. Closes #32681 from c21/join-followup. Authored-by: Cheng Su <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
As title. This PR is to add code-gen support for LEFT ANTI sort merge join. The main change is to extract
loadStreamed
inSortMergeJoinExec.doProduce()
. That is to set all columns values for streamed row, when the streamed row has no output row.Example query:
Example generated code:
Why are the changes needed?
Improve the query CPU performance.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added unit test in
WholeStageCodegenSuite.scala
, and existed unit test inExistenceJoinSuite.scala
.