-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: Support RightMark join for NestedLoop and Hash join #16083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @comphead |
This is cool! I got some questions:
|
@2010YOUY01 I think i'lll try to get sql queries optimized into a right mark join after support for symmetric hash join + sort merge join. Right mark is equivalent to left mark however internally I didn't swap during planning time. I had just marked a column on the probe batch as probe batches were incoming. |
@2010YOUY01 @Dandandan Is it possible to take a look? Thanks! |
Really cool work! I'll look over it in more detail over the weekend =) |
After updating
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n I quickly went through the PR
It would be nice to have fuzz tests for RightMark join, I'm not sure 3 tests would cover all possible cases, WDYT?
Alrighty! Regarding the lack of support for RightMark joins in some join operators, I believe it would be best to return an error in the constructor of those operators if they do not support the JoinType. I believe the PR currently would just return a wrong result. I don't (yet) know how much work it would be to adjust physical optimizer rules so that no bad JoinType x OperatorType combination gets constructed. Algorithmically, what I found hardest to understand is the switcheroo between build and probe side - You've modified I think a less confusing alternative would be to just add a |
I think instead we should just not support swapping for now and support it after all the join operators are supported. Returning an error will cause some tests to fail. |
| JoinType::LeftMark => { | ||
| JoinType::LeftMark | ||
| JoinType::RightMark => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit that I don't understand why LeftMark/RightMark are different from their Semi/Anti counterparts. It seems like the documentation of this method is missing LeftMark / RightMark too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the doc! I'm not quite sure either, I think it has to do with the extra boolean column crossing the boundary into the other side, resulting in the mark column being missing from the requirements
let mut bitmap = BooleanBufferBuilder::new(range.len()); | ||
bitmap.append_n(range.len(), false); | ||
input_indices | ||
.iter() | ||
.flatten() | ||
.map(|v| v.as_usize()) | ||
.filter(|v| range.contains(v)) | ||
.for_each(|v| { | ||
bitmap.set_bit(v - range.start, true); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that these lines are the same for get_mark_indices
, get_semi_indices
and get_anti_indices
. We could factor them out to a new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map.filter.foreach
wondering can it be done in single iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.for_each(|v| {
let idx = v.as_usize();
if range.contains(&idx) {
builder.set_bit(idx - range.start, true);
}
});
Could do something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I havI have refactored to use this
Co-authored-by: Christian <[email protected]>
@@ -305,6 +305,31 @@ async fn test_left_mark_join_1k_filtered() { | |||
.await | |||
} | |||
|
|||
// todo: add JoinTestType::HjSmj after Right mark SortMergeJoin support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed to add RightMark Join support to SortMergeJoin execution as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I will close #16226 as it is a duplicate
.collect::<PrimitiveArray<L>>(); | ||
let probe_indices = (0..prune_length) | ||
.map(|idx| { | ||
// For mark join we output a dummy index 0 to indicate the row had a match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would be nice to show as an example, it is challenging to read algorithms for upcoming contributors, but it can be done as follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #16415
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonathanc-n for bearing with this PR, I think it is good to go.
Thanks for this contribution, I'm planning to have this PR open for a little bit of more time to see if there are any other feedbacks |
🚀 -- I am feeling physically nervous that there are so many PRs open so starting the merge train! |
Which issue does this PR close?
Part of #13138 .
Rationale for this change
Revamp implementation of the previous stale implementation for RightMark
What changes are included in this PR?
Added support for right mark join functionality to NestedLoop + Hash join. A follow up pr will be made for supporting right mark in Sortmergejoin and sym hash join.
Are these changes tested?
Yes unit tests