Skip to content

Commit 66c50fd

Browse files
Miguel Varela RamosRobertLuciandeliahuvishalbollu
authored
Container as a Service (CaaS) (#2173)
Co-authored-by: Robert Lucian Chiriac <[email protected]> Co-authored-by: David Eliahu <[email protected]> Co-authored-by: Vishal Bollu <[email protected]> Co-authored-by: vishal <[email protected]>
1 parent 3ebc267 commit 66c50fd

File tree

649 files changed

+8451
-31549
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

649 files changed

+8451
-31549
lines changed

.circleci/config.yml

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,9 @@ commands:
2020
- run:
2121
name: Install Go
2222
command: |
23-
wget https://dl.google.com/go/go1.14.7.linux-amd64.tar.gz
24-
sudo tar -C /usr/local -xzf go1.14.7.linux-amd64.tar.gz
23+
sudo rm -rf /usr/local/go
24+
wget https://dl.google.com/go/go1.15.12.linux-amd64.tar.gz
25+
sudo tar -C /usr/local -xzf go1.15.12.linux-amd64.tar.gz
2526
rm -rf go*.tar.gz
2627
echo 'export PATH=$PATH:/usr/local/go/bin' >> $BASH_ENV
2728
echo 'export PATH=$PATH:~/go/bin' >> $BASH_ENV
@@ -75,18 +76,17 @@ commands:
7576

7677
jobs:
7778
test:
78-
docker:
79-
- image: circleci/python:3.6
79+
machine:
80+
image: ubuntu-1604:202104-01 # machine executor necessary to run go integration tests
8081
steps:
8182
- checkout
82-
- setup_remote_docker
8383
- install-go
8484
- run:
8585
name: Install Linting Tools
8686
command: |
8787
go get -u -v golang.org/x/lint/golint
8888
go get -u -v github.com/kyoh86/looppointer/cmd/looppointer
89-
sudo pip install black aiohttp
89+
pip3 install black aiohttp
9090
- run:
9191
name: Initialize Credentials
9292
command: |
@@ -111,9 +111,6 @@ jobs:
111111
- run:
112112
name: Go Tests
113113
command: make test-go
114-
- run:
115-
name: Python Tests
116-
command: make test-python
117114

118115
build-and-deploy:
119116
docker:
@@ -162,8 +159,8 @@ jobs:
162159
node_groups:
163160
- name: spot
164161
instance_type: t3.medium
165-
min_instances: 10
166-
max_instances: 10
162+
min_instances: 16
163+
max_instances: 16
167164
spot: true
168165
- name: cpu
169166
instance_type: c5.xlarge

.github/ISSUE_TEMPLATE/bug-report.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ assignees: ''
3939

4040
### Stack traces
4141

42-
(error output from `cortex logs <api name>`)
42+
(error output from CloudWatch Insights or from a random pod `cortex logs <api name>`)
4343

4444
```text
4545
<paste stack traces here>

CONTRIBUTING.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Remote development
44

5-
We recommend that you run your development environment on an EC2 instance due to frequent docker registry pushing. We've had a good experience using [Mutagen](https://mutagen.io/documentation/introduction) to synchronize local / remote file systems.
5+
We recommend that you run your development environment on an EC2 instance due to frequent docker registry pushing. We've had a good experience using [Mutagen](https://mutagen.io/documentation/introduction) to synchronize local / remote filesystems.
66

77
## Prerequisites
88

@@ -169,7 +169,7 @@ node_groups:
169169
Add this to your bash profile (e.g. `~/.bash_profile`, `~/.profile` or `~/.bashrc`), replacing the placeholders accordingly:
170170

171171
```bash
172-
# set the default image for APIs
172+
# set the default image registry
173173
export CORTEX_DEV_DEFAULT_IMAGE_REGISTRY="<account_id>.dkr.ecr.<region>.amazonaws.com/cortexlabs"
174174
175175
# redirect analytics and error reporting to our dev environment
@@ -209,7 +209,7 @@ Here is the typical full dev workflow which covers most cases:
209209
1. `make cluster-up` (creates a cluster using `dev/config/cluster.yaml`)
210210
2. `make devstart` (deletes the in-cluster operator, builds the CLI, and starts the operator locally; file changes will trigger the CLI and operator to re-build)
211211
3. Make your changes
212-
4. `make images-dev` (only necessary if API images or the manager are modified)
212+
4. `make images-dev` (only necessary if changes were made outside of the operator and CLI)
213213
5. Test your changes e.g. via `cortex deploy` (and repeat steps 3 and 4 as necessary)
214214
6. `make cluster-down` (deletes your cluster)
215215

@@ -224,6 +224,4 @@ If you are only modifying the CLI, `make cli-watch` will build the CLI and re-bu
224224

225225
If you are only modifying the operator, `make operator-local` will build and start the operator locally, and build/restart it when files are changed.
226226

227-
If you are modifying code in the API images (i.e. any of the Python serving code), `make images-dev` may build more images than you need during testing. For example, if you are only testing using the `python-handler-cpu` image, you can run `./dev/registry.sh update-single python-handler-cpu`.
228-
229227
See `Makefile` for additional dev commands.

Makefile

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -124,8 +124,7 @@ async-gateway-update:
124124
@./dev/registry.sh update-single async-gateway
125125
@kubectl delete pods -l cortex.dev/async=gateway --namespace=default
126126

127-
# Docker images
128-
127+
# docker images
129128
images-all:
130129
@./dev/registry.sh update all
131130
images-all-skip-push:
@@ -136,15 +135,8 @@ images-dev:
136135
images-dev-skip-push:
137136
@./dev/registry.sh update dev --skip-push
138137

139-
images-api:
140-
@./dev/registry.sh update api
141-
images-api-skip-push:
142-
@./dev/registry.sh update api --skip-push
143-
144138
images-manager-skip-push:
145139
@./dev/registry.sh update-single manager --skip-push
146-
images-iris:
147-
@./dev/registry.sh update-single python-handler-cpu
148140

149141
registry-create:
150142
@./dev/registry.sh create
@@ -170,15 +162,14 @@ format:
170162
# Tests #
171163
#########
172164

173-
test:
174-
@./build/test.sh
165+
# build test api images
166+
# make sure you login with your quay credentials
167+
build-test-api-images:
168+
@./test/utils/build-all.sh quay.io/cortexlabs-test
175169

176-
test-go:
170+
test:
177171
@./build/test.sh go
178172

179-
test-python:
180-
@./build/test.sh python
181-
182173
# run e2e tests on an existing cluster
183174
# read test/e2e/README.md for instructions first
184175
test-e2e:

build/build-image.sh

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -26,16 +26,4 @@ image=$1
2626
if [ "$image" == "inferentia" ]; then
2727
aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 790709498068.dkr.ecr.us-west-2.amazonaws.com
2828
fi
29-
30-
build_args=""
31-
32-
if [ "${image}" == "python-handler-gpu" ]; then
33-
cuda=("10.0" "10.1" "10.1" "10.2" "10.2" "11.0" "11.1")
34-
cudnn=("7" "7" "8" "7" "8" "8" "8")
35-
for i in ${!cudnn[@]}; do
36-
build_args="${build_args} --build-arg CUDA_VERSION=${cuda[$i]} --build-arg CUDNN=${cudnn[$i]}"
37-
docker build "$ROOT" -f $ROOT/images/$image/Dockerfile $build_args -t quay.io/cortexlabs/${image}:${CORTEX_VERSION}-cuda${cuda[$i]}-cudnn${cudnn[$i]} -t cortexlabs/${image}:${CORTEX_VERSION}-cuda${cuda[$i]}-cudnn${cudnn[$i]}
38-
done
39-
else
40-
docker build "$ROOT" -f $ROOT/images/$image/Dockerfile $build_args -t quay.io/cortexlabs/${image}:${CORTEX_VERSION} -t cortexlabs/${image}:${CORTEX_VERSION}
41-
fi
29+
docker build "$ROOT" -f $ROOT/images/$image/Dockerfile -t quay.io/cortexlabs/${image}:${CORTEX_VERSION} -t cortexlabs/${image}:${CORTEX_VERSION}

build/images.sh

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -19,24 +19,15 @@
1919

2020
set -euo pipefail
2121

22-
api_images=(
23-
"python-handler-cpu"
24-
"python-handler-gpu"
25-
"tensorflow-handler"
26-
"python-handler-inf"
27-
)
28-
2922
dev_images=(
30-
"downloader"
3123
"manager"
32-
"request-monitor"
24+
"proxy"
3325
"async-gateway"
3426
"enqueuer"
27+
"dequeuer"
3528
)
3629

3730
non_dev_images=(
38-
"tensorflow-serving-cpu"
39-
"tensorflow-serving-gpu"
4031
"cluster-autoscaler"
4132
"operator"
4233
"controller-manager"
@@ -53,16 +44,13 @@ non_dev_images=(
5344
"kube-rbac-proxy"
5445
"grafana"
5546
"event-exporter"
56-
"tensorflow-serving-inf"
5747
"metrics-server"
5848
"inferentia"
59-
"neuron-rtd"
6049
"nvidia"
6150
"kubexit"
6251
)
6352

6453
all_images=(
65-
"${api_images[@]}"
6654
"${dev_images[@]}"
6755
"${non_dev_images[@]}"
6856
)

build/lint.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ output=$(cd "$ROOT" && find . -type f \
8484
! -path "**/.vscode/*" \
8585
! -path "**/.idea/*" \
8686
! -path "**/.history/*" \
87+
! -path "**/testbin/*" \
8788
! -path "**/__pycache__/*" \
8889
! -path "**/.pytest_cache/*" \
8990
! -path "**/*.egg-info/*" \
@@ -118,6 +119,7 @@ if [ "$is_release_branch" = "true" ]; then
118119
! -path "**/.vscode/*" \
119120
! -path "**/.idea/*" \
120121
! -path "**/.history/*" \
122+
! -path "**/testbin/*" \
121123
! -path "**/__pycache__/*" \
122124
! -path "**/.pytest_cache/*" \
123125
! -path "**/*.egg-info/*" \
@@ -141,6 +143,7 @@ output=$(cd "$ROOT" && find . -type f \
141143
! -path "**/.idea/*" \
142144
! -path "**/.history/*" \
143145
! -path "**/.vscode/*" \
146+
! -path "**/testbin/*" \
144147
! -path "**/__pycache__/*" \
145148
! -path "**/.pytest_cache/*" \
146149
! -path "**/*.egg-info/*" \
@@ -164,6 +167,7 @@ output=$(cd "$ROOT" && find . -type f \
164167
! -path "**/.idea/*" \
165168
! -path "**/.history/*" \
166169
! -path "**/.vscode/*" \
170+
! -path "**/testbin/*" \
167171
! -path "**/__pycache__/*" \
168172
! -path "**/.pytest_cache/*" \
169173
! -path "**/*.egg-info/*" \
@@ -188,6 +192,7 @@ output=$(cd "$ROOT" && find . -type f \
188192
! -path "**/.vscode/*" \
189193
! -path "**/.idea/*" \
190194
! -path "**/.history/*" \
195+
! -path "**/testbin/*" \
191196
! -path "**/__pycache__/*" \
192197
! -path "**/.pytest_cache/*" \
193198
! -path "**/*.egg-info/*" \
@@ -210,6 +215,7 @@ output=$(cd "$ROOT" && find . -type f \
210215
! -path "**/.idea/*" \
211216
! -path "**/.history/*" \
212217
! -path "**/.vscode/*" \
218+
! -path "**/testbin/*" \
213219
! -path "**/__pycache__/*" \
214220
! -path "**/.pytest_cache/*" \
215221
! -path "**/*.egg-info/*" \

build/push-image.sh

Lines changed: 1 addition & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,4 @@ host=$1
2323
image=$2
2424

2525
echo "$DOCKER_PASSWORD" | docker login -u "$DOCKER_USERNAME" --password-stdin
26-
27-
if [ "$image" == "python-handler-gpu" ]; then
28-
cuda=("10.0" "10.1" "10.1" "10.2" "10.2" "11.0" "11.1")
29-
cudnn=("7" "7" "8" "7" "8" "8" "8")
30-
for i in ${!cudnn[@]}; do
31-
docker push $host/cortexlabs/${image}:${CORTEX_VERSION}-cuda${cuda[$i]}-cudnn${cudnn[$i]}
32-
done
33-
else
34-
docker push $host/cortexlabs/${image}:${CORTEX_VERSION}
35-
fi
26+
docker push $host/cortexlabs/${image}:${CORTEX_VERSION}

build/test.sh

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -79,11 +79,6 @@ function run_go_tests() {
7979
)
8080
}
8181

82-
function run_python_tests() {
83-
docker build $ROOT -f $ROOT/images/test/Dockerfile -t cortexlabs/test
84-
docker run cortexlabs/test
85-
}
86-
8782
function run_e2e_tests() {
8883
if [ "$create_cluster" = "yes" ]; then
8984
pytest $ROOT/test/e2e/tests --config "$sub_cmd"
@@ -94,11 +89,6 @@ function run_e2e_tests() {
9489

9590
if [ "$cmd" = "go" ]; then
9691
run_go_tests
97-
elif [ "$cmd" = "python" ]; then
98-
run_python_tests
9992
elif [ "$cmd" = "e2e" ]; then
10093
run_e2e_tests
101-
else
102-
run_go_tests
103-
run_python_tests
10494
fi

cli/cluster/logs.go

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,40 @@ import (
3535
"github.com/gorilla/websocket"
3636
)
3737

38+
func GetLogs(operatorConfig OperatorConfig, apiName string) (schema.LogResponse, error) {
39+
httpRes, err := HTTPGet(operatorConfig, "/logs/"+apiName)
40+
if err != nil {
41+
return schema.LogResponse{}, err
42+
}
43+
44+
var logResponse schema.LogResponse
45+
if err = json.Unmarshal(httpRes, &logResponse); err != nil {
46+
return schema.LogResponse{}, errors.Wrap(err, "/logs/"+apiName, string(httpRes))
47+
}
48+
49+
return logResponse, nil
50+
}
51+
52+
func GetJobLogs(operatorConfig OperatorConfig, apiName string, jobID string) (schema.LogResponse, error) {
53+
httpRes, err := HTTPGet(operatorConfig, "/logs/"+apiName, map[string]string{"jobID": jobID})
54+
if err != nil {
55+
return schema.LogResponse{}, err
56+
}
57+
58+
var logResponse schema.LogResponse
59+
if err = json.Unmarshal(httpRes, &logResponse); err != nil {
60+
return schema.LogResponse{}, errors.Wrap(err, "/logs/"+apiName, string(httpRes))
61+
}
62+
63+
return logResponse, nil
64+
}
65+
3866
func StreamLogs(operatorConfig OperatorConfig, apiName string) error {
39-
return streamLogs(operatorConfig, "/logs/"+apiName)
67+
return streamLogs(operatorConfig, "/streamlogs/"+apiName)
4068
}
4169

4270
func StreamJobLogs(operatorConfig OperatorConfig, apiName string, jobID string) error {
43-
return streamLogs(operatorConfig, "/logs/"+apiName, map[string]string{"jobID": jobID})
71+
return streamLogs(operatorConfig, "/streamlogs/"+apiName, map[string]string{"jobID": jobID})
4472
}
4573

4674
func streamLogs(operatorConfig OperatorConfig, path string, qParams ...map[string]string) error {

0 commit comments

Comments
 (0)