-
Notifications
You must be signed in to change notification settings - Fork 135
schedulers/kubernetes_scheduler: add workspace/patching support #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@d4l3k has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This pull request was exported from Phabricator. Differential Revision: D34125887 |
Summary: This adds patching support to the kubernetes scheduler. It requires you to specify `image_repo` as a config option with the docker repository to push to. If the dryrun/schedule methods find a local image such as `sha256:...` it'll remap it to a remote repo package and push it during schedule. Pull Request resolved: #384 Test Plan: ``` pyre pytest torchx/schedulers/test/kubernetes_scheduler_test.py ``` ``` (torchx) tristanr@tristanr-arch2 ~/D/torchx-proj> torchx run --scheduler kubernetes -c queue=default,image_repo=495572122715.dkr.ecr.us-west-2.amazonaws.com/torchx/integration-tests --wait --log utils.sh sh foo.sh torchx 2022-02-09 15:51:12 INFO loaded configs from /home/tristanr/Developer/torchx-proj/.torchxconfig torchx 2022-02-09 15:51:12 INFO building patch images for workspace: file:///home/tristanr/Developer/torchx-proj... torchx 2022-02-09 15:51:13 INFO built image sha256:d1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec from ghcr.io/pytorch/torchx:0.1.2dev0 torchx 2022-02-09 15:51:14 INFO pushing image 495572122715.dkr.ecr.us-west-2.amazonaws.com/torchx/integration-tests:d1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec... torchx 2022-02-09 15:51:14 INFO docker: {'status': 'The push refers to repository [495572122715.dkr.ecr.us-west-2.amazonaws.com/torchx/integration-tests]'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '004e5e059580'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'de1d3a8ac491'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'e6d41c036803'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '0827b8e37332'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'a8496aa14f72'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '1f84c52a7d38'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '0f801b69538d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '354dfcbe6a14'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'f15a0881ce19'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '824bf068fd3d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '1f84c52a7d38'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '824bf068fd3d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': 'f15a0881ce19'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '0f801b69538d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '354dfcbe6a14'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '0827b8e37332'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'de1d3a8ac491'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'e6d41c036803'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '004e5e059580'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'a8496aa14f72'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '824bf068fd3d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '0f801b69538d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '1f84c52a7d38'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'f15a0881ce19'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '354dfcbe6a14'} torchx 2022-02-09 15:51:16 INFO docker: {'status': 'd1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec: digest: sha256:da9fba179cb37f2f6d6d09c16dc4f0c39ca84a6fbb767c0aff7b77738b608805 size: 2413'} torchx 2022-02-09 15:51:16 INFO docker: {'progressDetail': {}, 'aux': {'Tag': 'd1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec', 'Digest': 'sha256:da9fba179cb37f2f6d6d09c16dc4f0c39ca84a6fbb767c0aff7b77738b608805', 'Size': 2413}} kubernetes://torchx/default:sh-n71zqm25lrk61 torchx 2022-02-09 15:51:17 INFO Launched app: kubernetes://torchx/default:sh-n71zqm25lrk61 torchx 2022-02-09 15:51:17 INFO AppStatus: msg: <NONE> num_restarts: -1 roles: [] state: PENDING (2) structured_error_msg: <NONE> ui_url: null torchx 2022-02-09 15:51:17 INFO Job URL: None torchx 2022-02-09 15:51:17 INFO Waiting for the app to finish... torchx 2022-02-09 15:51:17 INFO Waiting for app to start before logging... torchx 2022-02-09 15:51:22 INFO Job finished: SUCCEEDED sh/0 2022-02-09T23:51:21.702981500Z foo ``` Reviewed By: kiukchung Differential Revision: D34125887 Pulled By: d4l3k fbshipit-source-id: e03d6c0ea70f4827b1eb5d24c8ad973c6c75a859
Summary: This adds patching support to the kubernetes scheduler. It requires you to specify `image_repo` as a config option with the docker repository to push to. If the dryrun/schedule methods find a local image such as `sha256:...` it'll remap it to a remote repo package and push it during schedule. Pull Request resolved: #384 Test Plan: ``` pyre pytest torchx/schedulers/test/kubernetes_scheduler_test.py ``` ``` (torchx) tristanr@tristanr-arch2 ~/D/torchx-proj> torchx run --scheduler kubernetes -c queue=default,image_repo=495572122715.dkr.ecr.us-west-2.amazonaws.com/torchx/integration-tests --wait --log utils.sh sh foo.sh torchx 2022-02-09 15:51:12 INFO loaded configs from /home/tristanr/Developer/torchx-proj/.torchxconfig torchx 2022-02-09 15:51:12 INFO building patch images for workspace: file:///home/tristanr/Developer/torchx-proj... torchx 2022-02-09 15:51:13 INFO built image sha256:d1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec from ghcr.io/pytorch/torchx:0.1.2dev0 torchx 2022-02-09 15:51:14 INFO pushing image 495572122715.dkr.ecr.us-west-2.amazonaws.com/torchx/integration-tests:d1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec... torchx 2022-02-09 15:51:14 INFO docker: {'status': 'The push refers to repository [495572122715.dkr.ecr.us-west-2.amazonaws.com/torchx/integration-tests]'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '004e5e059580'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'de1d3a8ac491'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'e6d41c036803'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '0827b8e37332'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'a8496aa14f72'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '1f84c52a7d38'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '0f801b69538d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '354dfcbe6a14'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': 'f15a0881ce19'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Preparing', 'progressDetail': {}, 'id': '824bf068fd3d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '1f84c52a7d38'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '824bf068fd3d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': 'f15a0881ce19'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '0f801b69538d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Waiting', 'progressDetail': {}, 'id': '354dfcbe6a14'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '0827b8e37332'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'de1d3a8ac491'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'e6d41c036803'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '004e5e059580'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'a8496aa14f72'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '824bf068fd3d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '0f801b69538d'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '1f84c52a7d38'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': 'f15a0881ce19'} torchx 2022-02-09 15:51:15 INFO docker: {'status': 'Layer already exists', 'progressDetail': {}, 'id': '354dfcbe6a14'} torchx 2022-02-09 15:51:16 INFO docker: {'status': 'd1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec: digest: sha256:da9fba179cb37f2f6d6d09c16dc4f0c39ca84a6fbb767c0aff7b77738b608805 size: 2413'} torchx 2022-02-09 15:51:16 INFO docker: {'progressDetail': {}, 'aux': {'Tag': 'd1cd394f88861a5ca18de88cc0801513cd6c3dc7d945f7cbfe7121bb1d552bec', 'Digest': 'sha256:da9fba179cb37f2f6d6d09c16dc4f0c39ca84a6fbb767c0aff7b77738b608805', 'Size': 2413}} kubernetes://torchx/default:sh-n71zqm25lrk61 torchx 2022-02-09 15:51:17 INFO Launched app: kubernetes://torchx/default:sh-n71zqm25lrk61 torchx 2022-02-09 15:51:17 INFO AppStatus: msg: <NONE> num_restarts: -1 roles: [] state: PENDING (2) structured_error_msg: <NONE> ui_url: null torchx 2022-02-09 15:51:17 INFO Job URL: None torchx 2022-02-09 15:51:17 INFO Waiting for the app to finish... torchx 2022-02-09 15:51:17 INFO Waiting for app to start before logging... torchx 2022-02-09 15:51:22 INFO Job finished: SUCCEEDED sh/0 2022-02-09T23:51:21.702981500Z foo ``` Reviewed By: kiukchung Differential Revision: D34125887 Pulled By: d4l3k fbshipit-source-id: bc1177ee17e33d5e6bdd340234755c5c53670293
This pull request was exported from Phabricator. Differential Revision: D34125887 |
Codecov Report
@@ Coverage Diff @@
## main #384 +/- ##
==========================================
- Coverage 94.70% 94.34% -0.36%
==========================================
Files 63 63
Lines 3359 3398 +39
==========================================
+ Hits 3181 3206 +25
- Misses 178 192 +14
Continue to review full report at Codecov.
|
This adds patching support to the kubernetes scheduler. It requires you to specify
image_repo
as a config option with the docker repository to push to.If the dryrun/schedule methods find a local image such as
sha256:...
it'll remap it to a remote repo package and push it during schedule.Test plan: