Add distinct to the right side when LOJ + IS NULL is rewritten to Semijoin#24884
Conversation
|
Fix the title to be more specific - add distinct to the right side when LOJ + IS NULL is rewritten to Semijoin |
|
Also this only partially addresses the issue linked because there could be other direct uses of semijoin that we don't address. This just improves on the original optimization. |
|
Add couple more tests:
|
d1e556a to
98c91f4
Compare
|
@jaystarshot - please take a look. Thanks! |
jaystarshot
left a comment
There was a problem hiding this comment.
Can you please also add a release note?
|
This only applies an agg if the previous join is a Left join, isn't it better to apply this to all semi joins for a general case (as mentioned in the issue) ? |
Hi @jaystarshot, thanks for your review. Yeah we are going to follow up on extracting this out as a more general case support as a followup. We will keep it updated in the linked issue. |
|
Thanks for the release note! Some formatting nits. |
|
Hi @jaystarshot could you help with another stamp after rebase? Thank you |
Description
issue #24510
The join might have a huge right side in cases of following optimization:
When left join has the 'is null' key filter on the right side, it is effectively making the query return rows from the left side where there is a no match on the right side. This currently is optimized by converting the section into a left semi join. But the current issue is the right side (key only) might have large amount of duplication that is completely unnecessary for evaluation of the join. This PR addresses it by adding a distinct aggregation operator before to optimize the performance
Selective Meta internal production queries showed 100x performance gain and 3x memory reduction