-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Support enrich remote mode #104993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support enrich remote mode #104993
Conversation
Hi @dnhatn, I've created a changelog YAML for you. |
* Ensure that no remote enrich is allowed after a reduction or an enrich with coordinator mode. | ||
* <p> | ||
* TODO: | ||
* For Limit and TopN, we can insert the same node after the remote enrich (also needs to move projections around) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will work on a follow-up for this.
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks Nhat for adding an extensive set of tests!
|
||
public class EnrichHelper { | ||
|
||
static String randomEnrichCommand(EnrichPolicy policy, String name, Enrich.Mode mode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better suited for org.elasticsearch.xpack.esql.EsqlTestUtils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I've moved it to 6027453
Thanks, Costin! |
Please see #105095 for the follow-up work related to this PR. |
This PR supports enrich in remote mode. Enrich with remote mode can't be after another enrich with the coordinator mode, an aggregation, or a limit. While we can't address the first two limitations, we should remove the constraint with LIMIT. Otherwise, users are forced to write queries that may not perform well. For instance,
does not work. In such cases, users must rewrite it as
which is equivalent to bringing all data to the coordinating cluster.
We might consider implementing the actual remote enrich on the coordinating cluster (like remote field extracting), however, this requires retaining the originating cluster and restructing pages for routing, which might be complicated.