Skip to content

[Jobs] Add log streaming for jobs #20976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Dec 15, 2021
Merged

Conversation

jiaodong
Copy link
Member

@jiaodong jiaodong commented Dec 8, 2021

First commit comes from ed's #20459, while subsequent commits make some refactoring changes, fix tests and backwards compatibility checks with existing jobs functionality.

Why are these changes needed?

Current logs API simply returns a str to unblock development and integration. We should add proper log streaming for better UX and external job manager integration.

Related issue number

Closes #19415

Test Plan

ray cluster + dashboard from 1.9 public release, use CLI code from this PR:

Screen Shot 2021-12-09 at 4 37 08 PM

Ensure it works and there's no automatic log following.

ray cluster + dashboard from source, use CLI code from this PR:

Screen Shot 2021-12-09 at 4 37 38 PM

Ensure it works and automatic following is enabled with finishing message.

Backwards compatibility test

Manually edit cli.py implementation of job_submit to use skip version check and call into version 1 only API:

  if not no_wait:
      cli_logger.print("Tailing logs until the job exits "
                       "(disable with --no-wait):")
      asyncio.get_event_loop().run_until_complete(_tail_logs(client, job_id))

Ensure we can see compatibility exception thrown as expected with 1 return code:

https://gist.github.com/jiaodong/d54a19e7252c487776416d6e52f8cfd5

# sdk version 0 does not have log streaming
if not no_wait and int(sdk_version) > 0:
    cli_logger.print("Tailing logs until the job exits "
                     "(disable with --no-wait):")
    asyncio.get_event_loop().run_until_complete(_tail_logs(client, job_id))

Change back to PR's implementation with version check and ensures shell script and pytest case passes.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

edoakes and others added 6 commits December 8, 2021 10:48
@jiaodong jiaodong requested review from ericl and scv119 as code owners December 8, 2021 18:59
@jiaodong jiaodong changed the title [WIP][Jobs] Add log streaming for jobs [Jobs] Add log streaming for jobs Dec 10, 2021
@jiaodong jiaodong requested review from edoakes and removed request for ericl and scv119 December 10, 2021 00:42
@jiaodong
Copy link
Member Author

Next items on this PR:

  • Python 3.6 tests are failing on CI while 3.8 locally is not, need further investigation.
  • Need to add cli, http and sdk backwards compatibility tests using previous public ray versions, started with 1.9.

@edoakes
Copy link
Collaborator

edoakes commented Dec 13, 2021

@richardliaw second set of eyeballs on the API/output?

Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks good

@jiaodong
Copy link
Member Author

Failed test suit "Small and Large" seems flaky on master that's also failing on the documentation change PR: https://buildkite.com/ray-project/ray-builders-pr/builds/20955#82d05873-927b-4098-8823-0839ee78fa2c. Retrying.

@edoakes edoakes merged commit ed34434 into ray-project:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

logs streaming API in http server
5 participants