-
Notifications
You must be signed in to change notification settings - Fork 25.4k
ES|QL: Improve generative tests for FORK [130015] #131206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL: Improve generative tests for FORK [130015] #131206
Conversation
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @svilen-mihaylov-elastic
...c/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/command/pipe/WhereGenerator.java
Show resolved
Hide resolved
final String command = current.commandString(); | ||
|
||
// Try appending new command to parent of Fork. If we successfully execute (without exception) AND still retain the same | ||
// schema (all Fork branches must have the same schema), we append the command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a strict constraint? Isn't it enough that there are no type conflicts between branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would generate more interesting commands if we did not have this restriction here.
Right now AFAICS when I run this, the FORK branches contain mostly WHERE/MV_EXPAND/SORT and ENRICH sometimes.
FORK branches don't need to have the same schema - we just need to be sure that if a column is present in multiple branches, it has the same data type everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes correct, the schemas do not need to be exactly the same, just the overlapping columns need to have the same type.
I spend some time thinking about this and decided to go with the simple solution (for now). I think it will be challenging to allow different schemas which have compatible types AND allow for (mostly independent) fork sub-pipelines. Particularly for non-trivial subpipelines (> 5 stages), it will be non-trivial to keep adding random stages and checking if the types remain compatible. This may lead to a lot of repetitions and discarded results which will make the test run (possibly) a lot slower.
I will update the comment., and in any case, options are open later to iterate to further on this condition to balance coverage of Fork sub-pipelines and performance of the test itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove the check above, I get something like this:
line 1:1: Column [@timestamp] has conflicting data types in FORK branches: [UNSUPPORTED] and [KEYWORD]
line 1:940: EVAL does not support type [counter_long] as the return data type of expression [from sample_d*,*]
I ignore those errors by adding the patterns to the ALLOWED_ERRORS list. How does this sound?
...rc/main/java/org/elasticsearch/xpack/esql/qa/rest/generative/command/pipe/ForkGenerator.java
Outdated
Show resolved
Hide resolved
final String command = current.commandString(); | ||
|
||
// Try appending new command to parent of Fork. If we successfully execute (without exception) AND still retain the same | ||
// schema (all Fork branches must have the same schema), we append the command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would generate more interesting commands if we did not have this restriction here.
Right now AFAICS when I run this, the FORK branches contain mostly WHERE/MV_EXPAND/SORT and ENRICH sometimes.
FORK branches don't need to have the same schema - we just need to be sure that if a column is present in multiple branches, it has the same data type everywhere.
💔 Backport failedThe backport operation could not be completed due to the following error:
You can use sqren/backport to manually backport by running |
…king * upstream/main: (100 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
…-tracking * upstream/main: (44 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...
Addresses #130015