Skip to content

ES|QL: Improve generative tests for FORK [130015] #131206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

svilen-mihaylov-elastic
Copy link
Contributor

@svilen-mihaylov-elastic svilen-mihaylov-elastic commented Jul 14, 2025

Addresses #130015

@svilen-mihaylov-elastic svilen-mihaylov-elastic changed the title ForkGen Extend tests for Fork Jul 16, 2025
@svilen-mihaylov-elastic svilen-mihaylov-elastic marked this pull request as ready for review July 16, 2025 17:43
@svilen-mihaylov-elastic svilen-mihaylov-elastic added >test Issues or PRs that are addressing/adding tests :Search Relevance/Search Catch all for Search Relevance labels Jul 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 16, 2025
@svilen-mihaylov-elastic svilen-mihaylov-elastic changed the title Extend tests for Fork ES|QL: Improve generative tests for FORK https://github.com/elastic/elasticsearch/issues/130015 Jul 16, 2025
@svilen-mihaylov-elastic svilen-mihaylov-elastic changed the title ES|QL: Improve generative tests for FORK https://github.com/elastic/elasticsearch/issues/130015 ES|QL: Improve generative tests for FORK #130015 Jul 16, 2025
@svilen-mihaylov-elastic svilen-mihaylov-elastic changed the title ES|QL: Improve generative tests for FORK #130015 ES|QL: Improve generative tests for FORK [130015] Jul 16, 2025
@ioanatia ioanatia requested a review from luigidellaquila July 16, 2025 17:55
Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final String command = current.commandString();

// Try appending new command to parent of Fork. If we successfully execute (without exception) AND still retain the same
// schema (all Fork branches must have the same schema), we append the command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a strict constraint? Isn't it enough that there are no type conflicts between branches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would generate more interesting commands if we did not have this restriction here.
Right now AFAICS when I run this, the FORK branches contain mostly WHERE/MV_EXPAND/SORT and ENRICH sometimes.

FORK branches don't need to have the same schema - we just need to be sure that if a column is present in multiple branches, it has the same data type everywhere.

Copy link
Contributor Author

@svilen-mihaylov-elastic svilen-mihaylov-elastic Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct, the schemas do not need to be exactly the same, just the overlapping columns need to have the same type.

I spend some time thinking about this and decided to go with the simple solution (for now). I think it will be challenging to allow different schemas which have compatible types AND allow for (mostly independent) fork sub-pipelines. Particularly for non-trivial subpipelines (> 5 stages), it will be non-trivial to keep adding random stages and checking if the types remain compatible. This may lead to a lot of repetitions and discarded results which will make the test run (possibly) a lot slower.

I will update the comment., and in any case, options are open later to iterate to further on this condition to balance coverage of Fork sub-pipelines and performance of the test itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the check above, I get something like this:

line 1:1: Column [@timestamp] has conflicting data types in FORK branches: [UNSUPPORTED] and [KEYWORD]
line 1:940: EVAL does not support type [counter_long] as the return data type of expression [from sample_d*,*]

I ignore those errors by adding the patterns to the ALLOWED_ERRORS list. How does this sound?

final String command = current.commandString();

// Try appending new command to parent of Fork. If we successfully execute (without exception) AND still retain the same
// schema (all Fork branches must have the same schema), we append the command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would generate more interesting commands if we did not have this restriction here.
Right now AFAICS when I run this, the FORK branches contain mostly WHERE/MV_EXPAND/SORT and ENRICH sometimes.

FORK branches don't need to have the same schema - we just need to be sure that if a column is present in multiple branches, it has the same data type everywhere.

@svilen-mihaylov-elastic svilen-mihaylov-elastic added the auto-backport Automatically create backport pull requests when merged label Jul 21, 2025
@svilen-mihaylov-elastic svilen-mihaylov-elastic merged commit 9db4361 into elastic:main Jul 21, 2025
33 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

There are no branches to backport to. Aborting.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 131206

szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 22, 2025
…king

* upstream/main: (100 commits)
  Term vector API on stateless search nodes (elastic#129902)
  TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636)
  Add inference.put_custom rest-api-spec (elastic#131660)
  ESQL: Fewer serverless docs in tests (elastic#131651)
  Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132)
  Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656
  [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237)
  Add optimized path for intermediate values aggregator (elastic#131390)
  Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236)
  Refresh potential lost connections at query start for `_search` (elastic#130463)
  Add template_id to patterned-text type (elastic#131401)
  Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531)
  [ES|QL] Add doc for the COMPLETION command (elastic#131010)
  ESQL: Add times to topn status (elastic#131555)
  ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440)
  ES|QL: Improve generative tests for FORK [130015] (elastic#131206)
  Update index mapping update privileges (elastic#130894)
  ESQL: Added Sample operator NamedWritable to plugin (elastic#131541)
  update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419)
  Clarify heap size configuration (elastic#131607)
  ...
szybia added a commit to szybia/elasticsearch that referenced this pull request Jul 22, 2025
…-tracking

* upstream/main: (44 commits)
  Term vector API on stateless search nodes (elastic#129902)
  TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636)
  Add inference.put_custom rest-api-spec (elastic#131660)
  ESQL: Fewer serverless docs in tests (elastic#131651)
  Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132)
  Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656
  [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237)
  Add optimized path for intermediate values aggregator (elastic#131390)
  Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236)
  Refresh potential lost connections at query start for `_search` (elastic#130463)
  Add template_id to patterned-text type (elastic#131401)
  Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531)
  [ES|QL] Add doc for the COMPLETION command (elastic#131010)
  ESQL: Add times to topn status (elastic#131555)
  ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440)
  ES|QL: Improve generative tests for FORK [130015] (elastic#131206)
  Update index mapping update privileges (elastic#130894)
  ESQL: Added Sample operator NamedWritable to plugin (elastic#131541)
  update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419)
  Clarify heap size configuration (elastic#131607)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants