Skip to content

fix(esql): refresh keep_alive TTL on async query GET requests #130673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

emreyalvac
Copy link

Summary

Fixes a bug in the ESQL async query API where updating the keep_alive value via a GET /_query/async/{id}?keep_alive=... request did not properly extend the lifetime of the async result, causing unexpected 404s after the original TTL expired.

Details

  • The initial keep_alive duration specified in the POST request was honored, but follow-up GET requests with a new keep_alive value failed to refresh the TTL.
  • This PR ensures that if a new keep_alive value is passed in the follow-up GET request, it will properly update the expiration time of the async result.

How to Reproduce (Before Fix)

  1. Submit an async ESQL query with a short keep_alive, e.g. 15s.
  2. Within the TTL window, send a GET /_query/async/{id}?keep_alive=60s
  3. Wait >15s
  4. Expect 404 (bug)

After Fix

  • Async results will persist according to the most recent keep_alive value provided in the GET call.
  • Prevents premature cleanup.

Related

Closes: #130619

@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jul 6, 2025
@emreyalvac
Copy link
Author

@kertal can you check?

@kertal
Copy link
Member

kertal commented Jul 7, 2025

thx @emreyalvac not sure who's the best to review this, since I just reported it , and it's something we need in Kibana. dear @elasticsearch-esql , can you help me (or know who can take care of it, thx) . If this is the solution I assume it needs at least one test to cover it

@ioanatia ioanatia added the :Analytics/ES|QL AKA ESQL label Jul 8, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Jul 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@ioanatia ioanatia added the needs:triage Requires assignment of a team area label label Jul 8, 2025
@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jul 8, 2025
@astefan astefan self-assigned this Jul 8, 2025
@astefan astefan added the >bug label Jul 8, 2025
@astefan
Copy link
Contributor

astefan commented Jul 8, 2025

buildkite test this

@astefan
Copy link
Contributor

astefan commented Jul 8, 2025

@emreyalvac thank you for providing a fix in this PR. Would you mind looking into the test failures and, also, add a test for this fix?

@astefan
Copy link
Contributor

astefan commented Jul 11, 2025

CC @ChrisHegarty if you have a quicker turnaround on this one 🙏 .

@ChrisHegarty
Copy link
Contributor

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor

I locally hacked EsqlAsyncActionIT in order to minimally reproduce the issue, and verify the fix. I managed to reproduce the observed problem, but the fix does not work. It immediately returns org.elasticsearch.ResourceNotFoundException.

@emreyalvac did you reproduce and verity the change?

@ChrisHegarty
Copy link
Contributor

buildkite test this

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented Jul 11, 2025

One can set a keep-alive time through the initial query, and this works. I'm not sure that the keep-alive update over async get was ever implemented. It's been more than a year since I wrote this code, and it has changed quite a bit. I think that there is some more work to be done if we wanna support this - more than is in this PR.

@astefan astefan removed their assignment Jul 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >bug external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ES|QL] When requesting GET /_query/async/{id}?keep_alive={time} keep_alive is ignored
7 participants