-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-32129][SQL] Support AQE skew join with Union #28947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
retest this please |
Test build #124652 has finished for PR 28947 at commit
|
retest this please |
Test build #124705 has finished for PR 28947 at commit
|
retest this please |
ping @cloud-fan @gatorsmile |
Test build #124736 has finished for PR 28947 at commit
|
retest this please |
Test build #124740 has finished for PR 28947 at commit
|
return plan | ||
} | ||
|
||
// Try to handle skew join with union case, like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it more general? It seems like we can optimize any SMJ if its 2 children are both shuffle stages. cc @JkSelf @maryannxue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 3-table join, if they are in the same query stage, it means the shuffles are all leaf, and we will only optimize the first SMJ, as the second SMJ has only one side as shuffle stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 3-table join, if they are in the same query stage, it means the shuffles are all leaf, and we will only optimize the first SMJ, as the second SMJ has only one side as shuffle stage.
We have a 3-table skewed join implamentation in our internal code base. But we have replaced the skew join handling logic by community's. So our optimization is not work based on currnet OptimizeSkewedJoin
. I will try to re-implement it in current OptimizeSkewedJoin
and submit a PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it more general? It seems like we can optimize any SMJ if its 2 children are both shuffle stages. cc @JkSelf @maryannxue
Yes. we usually implemented some optimizations based on our inner usages and issues. So it may be not general. I only see the UNION case so far.
retest this please |
Test build #124785 has finished for PR 28947 at commit
|
Can we make a general approach here? e.g. if we just optimize SMJ with both sides as shuffle stages, we can even optimize the first join of a 3-table join plan. |
#29021 is the PR to handle more general skew pattern includes n-tables join. |
@cloud-fan @LantaoJin |
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
In the
apply
method ofOptimizeSkewedJoin
, we first match out theUnionExec
nodes, then try to optimize their children with current logic.Why are the changes needed?
Current, the AQE skew join only supports two tables join such as
But if the plan contains a Union, the skew join handling not work:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Add a UT.