Skip to content

schedulers/aws_batch: add a scheduler to launch jobs directly on aws_batch #381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions .github/workflows/aws-batch-integration-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: AWS Batch Integration Tests

on:
push:
branches:
- main
pull_request:

jobs:
awsbatch:
runs-on: ubuntu-18.04
permissions:
id-token: write
contents: read
steps:
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: 3.9
architecture: x64
- name: Checkout TorchX
uses: actions/checkout@v2
- name: Configure AWS
env:
AWS_ROLE_ARN: ${{ secrets.AWS_ROLE_ARN }}
run: |
if [ -n "$AWS_ROLE_ARN" ]; then
export AWS_WEB_IDENTITY_TOKEN_FILE=/tmp/awscreds
export AWS_DEFAULT_REGION=us-west-2

echo AWS_WEB_IDENTITY_TOKEN_FILE=$AWS_WEB_IDENTITY_TOKEN_FILE >> $GITHUB_ENV
echo AWS_ROLE_ARN=$AWS_ROLE_ARN >> $GITHUB_ENV
echo AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION >> $GITHUB_ENV

curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" "$ACTIONS_ID_TOKEN_REQUEST_URL" | jq -r '.value' > $AWS_WEB_IDENTITY_TOKEN_FILE
fi
- name: Install dependencies
run: |
set -eux
pip install -e .[dev]
- name: Run AWS Batch Integration Tests
run: |
set -ex

scripts/awsbatchint.sh
12 changes: 6 additions & 6 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
aiobotocore>=1.4.1
aiobotocore==2.1.0
ax-platform[mysql]==0.2.2
black==21.10b0
boto3==1.20.24
captum>=0.4.0
classy-vision>=0.6.0
flake8==3.9.0
fsspec[s3]==2021.10.1
fsspec[s3]==2022.1.0
importlib-metadata
ipython
kfp==1.8.9
moto==2.2.12
moto==3.0.2
pyre-extensions==0.0.21
pytest
pytorch-lightning==1.5.6
s3fs==2021.10.1
ray[default]==1.9.2
torch-model-archiver==0.4.2
torch==1.10.0
Expand All @@ -19,5 +21,3 @@ torchtext==0.11.0
torchvision==0.11.1
ts==0.5.1
usort==0.6.4
ipython
pytest
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ Works With
schedulers/kubernetes
schedulers/slurm
schedulers/ray
schedulers/aws_batch

.. _Pipelines:
.. toctree::
Expand Down
8 changes: 8 additions & 0 deletions docs/source/schedulers/aws_batch.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
AWS Batch
=================

.. automodule:: torchx.schedulers.aws_batch_scheduler
.. currentmodule:: torchx.schedulers.aws_batch_scheduler

.. autoclass:: AWSBatchScheduler
:members:
20 changes: 20 additions & 0 deletions scripts/awsbatchint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

set -ex

APP_ID="$(torchx run --wait --scheduler aws_batch -c queue=torchx utils.echo)"
torchx status "$APP_ID"
torchx describe "$APP_ID"
torchx log "$APP_ID"
LINES="$(torchx log "$APP_ID" | wc -l)"

if [ "$LINES" -ne 1 ]
then
echo "expected 1 log lines"
exit 1
fi
2 changes: 2 additions & 0 deletions torchx/schedulers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

from typing import Dict, Optional

import torchx.schedulers.aws_batch_scheduler as aws_batch_scheduler
import torchx.schedulers.docker_scheduler as docker_scheduler
import torchx.schedulers.kubernetes_scheduler as kubernetes_scheduler
import torchx.schedulers.local_scheduler as local_scheduler
Expand Down Expand Up @@ -48,6 +49,7 @@ def get_scheduler_factories() -> Dict[str, SchedulerFactory]:
"local_cwd": local_scheduler.create_cwd_scheduler,
"slurm": slurm_scheduler.create_scheduler,
"kubernetes": kubernetes_scheduler.create_scheduler,
"aws_batch": aws_batch_scheduler.create_scheduler,
}

ray_scheduler_creator = try_get_ray_scheduler()
Expand Down
Loading