Skip to content

[SPARK-12700] [SQL] embed condition into SMJ and BroadcastHashJoin #10653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jan 7, 2016

Currently SortMergeJoin and BroadcastHashJoin do not support condition, the need a followed Filter for that, the result projection to generate UnsafeRow could be very expensive if they generate lots of rows and could be filtered mostly by condition.

This PR brings the support of condition for SortMergeJoin and BroadcastHashJoin, just like other outer joins do.

This could improve the performance of Q72 by 7x (from 120s to 16.5s).

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48984 has finished for PR 10653 at commit a38d623.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 8, 2016

Test build #48986 has finished for PR 10653 at commit ade6f5d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -78,8 +78,11 @@ trait HashOuterJoin {

@transient private[this] lazy val leftNullRow = new GenericInternalRow(left.output.length)
@transient private[this] lazy val rightNullRow = new GenericInternalRow(right.output.length)
@transient private[this] lazy val boundCondition =
@transient private[this] lazy val boundCondition = if (condition.isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not related to this PR, just make it to aligned with others (a little faster for empty condition).

@davies
Copy link
Contributor Author

davies commented Jan 18, 2016

@nongli Do you have more comments on this one?

@nongli
Copy link
Contributor

nongli commented Jan 18, 2016

LGTM. Let's make sure we don't need this when whole stage codegen is further along.

@davies
Copy link
Contributor Author

davies commented Jan 19, 2016

For SMJ, we may not support whole stage codegen, this will be useful. For BroadcastHashJoin, once whole stage codegen is supported, we should not reply on this.

Merging this into master, thanks.

@asfgit asfgit closed this in 323d51f Jan 19, 2016
@nongli
Copy link
Contributor

nongli commented Jan 19, 2016

@davies I think it's reasonable to codegen the merge part of SMJ and have this start a new pipeline for whole stage codegen. Not required for the first go but shouldn't be too far away.

@davies
Copy link
Contributor Author

davies commented Jan 19, 2016

@nongli That sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants