diff --git a/docs/clients/install.md b/docs/clients/install.md index 5d1d1677b2..3517b37ea7 100644 --- a/docs/clients/install.md +++ b/docs/clients/install.md @@ -1,6 +1,16 @@ # Install -## Install with pip +## Install the CLI + + +```bash +# download CLI version 0.38.0 (Note the "v"): +bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)" +``` + +By default, the Cortex CLI is installed at `/usr/local/bin/cortex`. To install the executable elsewhere, export the `CORTEX_INSTALL_PATH` environment variable to your desired location before running the command above. + +## Install the CLI and Python client via pip To install the latest version: @@ -21,16 +31,6 @@ To upgrade to the latest version: pip install --upgrade cortex ``` -## Install without the Python client - - -```bash -# For example to download CLI version 0.38.0 (Note the "v"): -bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)" -``` - -By default, the Cortex CLI is installed at `/usr/local/bin/cortex`. To install the executable elsewhere, export the `CORTEX_INSTALL_PATH` environment variable to your desired location before running the command above. - ## Changing the CLI/client configuration directory By default, the CLI/client creates a directory at `~/.cortex/` and uses it to store environment configuration. To use a different directory, export the `CORTEX_CLI_CONFIG_DIR` environment variable before running any `cortex` commands. diff --git a/docs/clusters/instances/spot.md b/docs/clusters/instances/spot.md index 1a4c22f554..ac8be488b8 100644 --- a/docs/clusters/instances/spot.md +++ b/docs/clusters/instances/spot.md @@ -17,7 +17,7 @@ node_groups: on_demand_base_capacity: 0 # percentage of on demand instances to use after the on demand base capacity has been met [0, 100] (default: 50) - # note: setting this to 0 may hinder cluster scale up when spot instances are not available + # note: setting this to 0 may hinder cluster scale-up when spot instances are not available on_demand_percentage_above_base_capacity: 0 # max price for spot instances (default: the on-demand price of the primary instance type) diff --git a/docs/clusters/management/create.md b/docs/clusters/management/create.md index 238e9782af..19c583155d 100644 --- a/docs/clusters/management/create.md +++ b/docs/clusters/management/create.md @@ -9,9 +9,10 @@ ## Create a cluster on your AWS account + ```bash -# install the CLI -pip install cortex +# install the cortex CLI +bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)" # create a cluster cortex cluster up cluster.yaml diff --git a/docs/clusters/management/delete.md b/docs/clusters/management/delete.md index 964ce779a3..9eababca57 100644 --- a/docs/clusters/management/delete.md +++ b/docs/clusters/management/delete.md @@ -8,10 +8,13 @@ cortex cluster down When a Cortex cluster is created, an S3 bucket is created for its internal use. When running `cortex cluster down`, a lifecycle rule is applied to the bucket such that its entire contents are removed within the next 24 hours. You can safely delete the bucket at any time after `cortex cluster down` has finished running. -## Delete Certificates +## Delete SSL Certificate -If you've configured a custom domain for your APIs, you can remove the SSL Certificate and Hosted Zone for the domain by -following these [instructions](../networking/custom-domain.md#cleanup). +If you've set up HTTPS, you can remove the SSL Certificate by following these [instructions](../networking/https.md#cleanup). + +## Delete Hosted Zone + +If you've configured a custom domain for your APIs, follow these [instructions](../networking/custom-domain.md#cleanup) to delete the Hosted Zone. ## Keep Cortex Resources diff --git a/docs/clusters/management/production.md b/docs/clusters/management/production.md new file mode 100644 index 0000000000..8f7d9670d1 --- /dev/null +++ b/docs/clusters/management/production.md @@ -0,0 +1,89 @@ +# Production guide + +As you take Cortex from development to production, here are a few pointers that might be useful. + +## Use images from a colocated ECR + +Configure your cluster and APIs to use images from ECR in the same region as your cluster to accelerate scale-ups, reduce ingress costs, and remove the dependency on Cortex's public quay.io registry. + +You can find instructions for mirroring Cortex images [here](../advanced/self-hosted-images.md) + +## Handling Cortex updates/upgrades + +Use a Route 53 hosted zone as a proxy in front of your Cortex cluster. Every new Cortex cluster provisions a new API load balancer with a unique endpoint. Using a Route 53 hosted zone configured with a subdomain will expose your Cortex cluster API endpoint as a static endpoint (e.g. `cortex.your-company.com`). You will be able to upgrade Cortex versions without downtime, and you will avoid the need to updated your client code every time you migrate to a new cluster. You can find instructions for setting up a custom domain with a Route 53 hosted zone [here](../networking/custom-domain.md), and instructions for updating/upgrading your cluster [here](update.md). + +## Production cluster configuration + +### Securing your cluster + +The following configuration will improve security by preventing your cluster's nodes from being publicly accessible. + +```yaml +subnet_visibility: private + +nat_gateway: single # use "highly_available" for large clusters making requests to services outside of the cluster +``` + +You can make your load balancer private to prevent your APIs from being publicly accessed. In order to access your APIs, you will need to set up VPC peering between the Cortex cluster's VPC and the VPC containing the consumers of the Cortex APIs. See the [VPC peering guide](../networking/vpc-peering.md) for more details. + +```yaml +api_load_balancer_scheme: internal +``` + +You can also restrict access to your load balancers by IP address: + +```yaml +api_load_balancer_cidr_white_list: [0.0.0.0/0] +``` + +These two fields are also available for the operator load balancer. Keep in mind that if you make the operator load balancer private, you'll need to configure VPC peering to use the `cortex` CLI or Python client. + +```yaml +operator_load_balancer_scheme: internal +operator_load_balancer_cidr_white_list: [0.0.0.0/0] +``` + +See [here](../networking/load-balancers.md) for more information about the load balancers. + +### Ensure node provisioning + +You can take advantage of the cost savings of spot instances and the reliability of on-demand instances by utilizing the `priority` field in node groups. You can deploy two node groups, one that is spot and another that is on-demand. Set the priority of the spot node group to be higher than the priority of the on-demand node group. This encourages the cluster-autoscaler to try to spin up instances from the spot node group first. If there are no more spot instances available, the on-demand node group will be used instead. + +```yaml +node_groups: + - name: gpu-spot + instance_type: g4dn.xlarge + min_instances: 0 + max_instances: 5 + spot: true + priority: 100 + - name: gpu-on-demand + instance_type: g4dn.xlarge + min_instances: 0 + max_instances: 5 + priority: 1 +``` + +### Considerations for large clusters + +If you plan on scaling your Cortex cluster past 400 nodes or 800 pods, it is recommended to set `prometheus_instance_type` to a larger instance type. A good guideline is that a t3.medium instance can reliably handle 400 nodes and 800 pods. + +## API Spec + +### Container design + +Configure your health checks to be as accurate as possible to prevent requests from being routed to pods that aren't ready to handle traffic. + +### Pods section + +Make sure that `max_concurrency` is set to match the concurrency supported by your container. + +Tune `max_queue_length` to lower values if you would like to more aggressively redistribute requests to newer pods as your API scales up rather than allowing requests to linger in queues. This would mean that the clients consuming your APIs should implement retry logic with a delay (such as exponential backoff). + +### Compute section + +Make sure to specify all of the relevant compute resources (especially cpu and memory) to ensure that your pods aren't starved for resources. + +### Autoscaling + +Revisit the autoscaling docs for [Realtime APIs](../../workloads/realtime/autoscaling.md) and/or [Async APIs](../../workloads/async/autoscaling.md) to effectively handle production traffic by tuning the scaling rate, sensitivity, and over-provisioning. diff --git a/docs/clusters/management/update.md b/docs/clusters/management/update.md index 6e144602c3..87aac02402 100644 --- a/docs/clusters/management/update.md +++ b/docs/clusters/management/update.md @@ -1,36 +1,114 @@ # Update -## Update node group size +## Modify existing cluster + +You can add or remove node groups, resize existing node groups, and update some configuration fields of a running cluster. + +Fetch the current cluster configuration: ```bash -cortex cluster scale --node-group --min-instances --max-instances +cortex cluster info --print-config --name CLUSTER_NAME --region REGION > cluster.yaml ``` -## Upgrade to a newer version +Make your desired changes, and then apply them: ```bash -# spin down your cluster -cortex cluster down --name --region +cortex cluster configure cluster.yaml +``` + +Cortex will calculate the difference and you will be prompted with the update plan. + +If you would like to update fields that cannot be modified on a running cluster, you must create a new cluster with your desired configuration. + +## Upgrade to a new version + +Updating an existing Cortex cluster is not supported at the moment. Please spin down the previous version of the cluster, install the latest version of the Cortex CLI, and use it to spin up a new Cortex cluster. See the next section for how to do this without downtime. + +## Update or upgrade without downtime + +It is possible to update to a new version Cortex or to migrate from one cluster to another without downtime. + +Note: it is important to not spin down your previous cluster until after your new cluster is receiving traffic. + +### Set up a subdomain using a Route 53 hosted zone + +If you've already set up a subdomain with a Route 53 hosted zone pointing to your cluster, skip this step. + +Setting up a Route 53 hosted zone allows you to transfer traffic seamlessly from from an existing cluster to a new cluster, thereby avoiding downtime. You can find the instructions for setting up a subdomain [here](../networking/custom-domain.md). You will need to update any clients interacting with your Cortex APIs to point to the new subdomain. -# update your CLI to the latest version -pip install --upgrade cortex +### Export all APIs from your previous cluster -# confirm version +The `cluster export` command can be used to get the YAML specifications of all APIs deployed in your cluster: + +```bash +cortex cluster export --name --region +``` + +### Spin up a new cortex cluster + +If you are creating a new cluster with the same Cortex version: + +```bash +cortex cluster up new-cluster.yaml --configure-env cortex2 +``` + +This will create a CLI environment named `cortex2` for accessing the new cluster. + +If you are spinning a up a new cluster with a different Cortex version, first install the cortex CLI matching the desired cluster version: + +```bash +# download the desired CLI version, replace 0.38.0 with the desired version (Note the "v"): +bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)" + +# confirm Cortex CLI version cortex version -# spin up your cluster -cortex cluster up cluster.yaml +# spin up your cluster using the new CLI version +cortex cluster up cluster.yaml --configure-env cortex2 +``` + +You can use different Cortex CLIs to interact with the different versioned clusters; here is an example: + +```bash +# download the desired CLI version, replace 0.38.0 with the desired version (Note the "v"): +CORTEX_INSTALL_PATH=$(pwd)/cortex0.38.0 bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)" + +# confirm cortex CLI version +./cortex0.38.0 version +``` + +### Deploy the APIs to your new cluster + +Please read the [changelogs](https://github.com/cortexlabs/cortex/releases) and the latest documentation to identify any features and breaking changes in the new version. You may need to make modifications to your cluster and/or API configuration files. + +```bash +cortex deploy -e cortex2 +``` + +After you've updated the API specifications and images if necessary, you can deploy them onto your new cluster. + +### Point your custom domain to your new cluster + +Verify that all of the APIs in your new cluster are working as expected by accessing via the cluster's API load balancer URL. + +Get the cluster's API load balancer URL: + +```bash +cortex cluster info --name --region ``` -## Upgrade without downtime +Once the APIs on the new cluster have been verified as working properly, it is recommended to update `min_replicas` of your APIs on the new cluster to match the current values in your previous cluster. This will avoid large sudden scale-up events as traffic is shifted to the new cluster. -In production environments, you can upgrade your cluster without downtime if you have a backend service or DNS in front of your Cortex cluster: +Then, navigate to the A record in your custom domains's Route 53 hosted zone and update the Alias to point the new cluster's API load balancer URL. Rather than suddenly routing all of your traffic from the previous cluster to the new cluster, you can use [weighted records](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-weighted) to incrementally route more traffic to your new cluster. -1. Spin up a new cluster. For example: `cortex cluster up new-cluster.yaml --configure-env cortex2` (this will create a CLI environment named `cortex2` for accessing the new cluster). -1. Re-deploy your APIs in your new cluster. For example, if the name of your CLI environment for your existing cluster is `cortex`, you can use `cortex get --env cortex` to list all running APIs in your cluster, and re-deploy them in the new cluster by running `cortex deploy --env cortex2` for each API. Alternatively, you can run `cortex cluster export --name --region ` to export the API specifications for all of your running APIs, change directories the folder that was exported, and run `cortex deploy --env cortex2 ` for each API that you want to deploy in the new cluster. -1. Route requests to your new cluster. - * If you are using a custom domain: update the A record in your Route 53 hosted zone to point to your new cluster's API load balancer. - * If you have a backend service which makes requests to Cortex: update your backend service to make requests to the new cluster's endpoints. - * If you have a self-managed API Gateway in front of your Cortex cluster: update the routes to use new cluster's endpoints. -1. Spin down your previous cluster. If you updated DNS settings, wait 24-48 hours before spinning down your previous cluster to allow the DNS cache to be flushed. -1. You may now rename your new CLI environment name if you'd like (e.g. to rename it back to "cortex": `cortex env rename cortex2 cortex`) +If you increased `min_replicas` for your APIs in the new cluster during the transition, you may reduce `min_replicas` back to your desired level once all traffic has been shifted. + +### Spin down the previous cluster + +After confirming that your previous cluster has completed servicing all existing traffic and is not receiving any new traffic, spin down your previous cluster: + +```bash +# Note: it is recommended to install the Cortex CLI matching the previous cluster's version to ensure proper deletion. + +cortex cluster down --name --region +``` diff --git a/docs/clusters/networking/api-gateway.md b/docs/clusters/networking/api-gateway.md new file mode 100644 index 0000000000..ab78cad7bf --- /dev/null +++ b/docs/clusters/networking/api-gateway.md @@ -0,0 +1,149 @@ +# API Gateway + +This guide shows how set up AWS API Gateway for your Cortex APIs, which is the simplest way to enable HTTPS if a custom domain is not required. See [here](https.md) for how to set up a custom domain with SSL certificates instead. + +Please note that one limitation of API Gateway is that there is a 30-second time limit for all requests. + +If your API load balancer is internet-facing (which is the default, or you set `api_load_balancer_scheme: internet-facing` in your cluster configuration file before creating your cluster), use the [first section](#internet-facing-load-balancer) of this guide. + +If your API load balancer is internal (i.e. you set `api_load_balancer_scheme: internal` in your cluster configuration file before creating your cluster), use the [second section](#internal-load-balancer) of this guide. + +## Internet-facing load balancer + +_This section applies if your API load balancer is internet-facing (which is the default, or you set `api_load_balancer_scheme: internet-facing` in your cluster configuration file before creating your cluster). If your API load balancer is internal, see the [internal load balancer](#internal-load-balancer) section below._ + +### Create an API Gateway + +Go to the [API Gateway console](https://console.aws.amazon.com/apigateway/home), select "REST API" under "Choose an API type", and click "Build". + +![](https://user-images.githubusercontent.com/808475/78293216-18269e80-74dd-11ea-9e68-86922c2cbc7c.png) + +Select "REST" and "New API", name your API (e.g. "cortex"), select either "Regional" or "Edge optimized" (depending on your preference), and click "Create API". + +![](https://user-images.githubusercontent.com/808475/78293434-66d43880-74dd-11ea-92d6-692158171a3f.png) + +Select "Actions" > "Create Resource": + +![](https://user-images.githubusercontent.com/808475/80154502-8b6b7f80-8574-11ea-9c78-7d9f277bf55b.png) + +Select "Configure as proxy resource" and "Enable API Gateway CORS", and click "Create Resource" + +![](https://user-images.githubusercontent.com/808475/80154565-ad650200-8574-11ea-8753-808cd35902e2.png) + +Select "HTTP Proxy" and set "Endpoint URL" to "http:///{proxy}". You can get your API load balancer endpoint via `cortex cluster info`; make sure to prepend `http://` and append `/{proxy}`. For example, mine is: `http://a9eaf69fd125947abb1065f62de59047-81cdebc0275f7d96.elb.us-west-2.amazonaws.com/{proxy}`. + +Leave "Content Handling" set to "Passthrough" and Click "Save". + +![](https://user-images.githubusercontent.com/808475/80154735-13ea2000-8575-11ea-83ca-58f182df83c6.png) + +Select "Actions" > "Deploy API" + +![](https://user-images.githubusercontent.com/808475/80154802-2c5a3a80-8575-11ea-9ab3-de89885fd658.png) + +Create a new stage (e.g. "dev") and click "Deploy" + +![](https://user-images.githubusercontent.com/808475/80154859-4431be80-8575-11ea-9305-50384b1f9847.png) + +Copy your "Invoke URL" + +![](https://user-images.githubusercontent.com/808475/80154911-5dd30600-8575-11ea-9682-1a7328783011.png) + +### Use your new endpoint + +You may now use the "Invoke URL" in place of your API load balancer endpoint in your client. For example, this curl request: + +```bash +curl http://a9eaf69fd125947abb1065f62de59047-81cdebc0275f7d96.elb.us-west-2.amazonaws.com/hello-world -X POST -H "Content-Type: application/json" -d @sample.json +``` + +Would become: + +```bash +curl https://31qjv48rs6.execute-api.us-west-2.amazonaws.com/dev/hello-world -X POST -H "Content-Type: application/json" -d @sample.json +``` + +### Cleanup + +Delete the API Gateway before spinning down your Cortex cluster: + +![](https://user-images.githubusercontent.com/808475/80155073-bdc9ac80-8575-11ea-99a1-95c0579da79e.png) + +## Internal load balancer + +_This section applies if your API load balancer is internal (i.e. you set `api_load_balancer_scheme: internal` in your cluster configuration file before creating your cluster). If your API load balancer is internet-facing, see the [internet-facing load balancer](#internet-facing-load-balancer) section above._ + +### Create a VPC Link + +Navigate to AWS's EC2 Load Balancer dashboard and locate the Cortex API load balancer. You can determine which is the API load balancer by inspecting the `kubernetes.io/service-name` tag: + +![](https://user-images.githubusercontent.com/808475/80142777-961c1980-8560-11ea-9202-40964dbff5e9.png) + +Take note of the load balancer's name. + +Go to the [API Gateway console](https://console.aws.amazon.com/apigateway/home), click "VPC Links" on the left sidebar, and click "Create" + +![](https://user-images.githubusercontent.com/808475/80142466-0c6c4c00-8560-11ea-8293-eb5e5572b797.png) + +Select "VPC link for REST APIs", name your VPC link (e.g. "cortex"), select the API load balancer, and click "Create". + +![](https://user-images.githubusercontent.com/808475/80143027-03c84580-8561-11ea-92de-9ed0a5dfa593.png) + +Wait for the VPC link to be created (it will take a few minutes) + +![](https://user-images.githubusercontent.com/808475/80144088-bbaa2280-8562-11ea-901b-8520eb253df7.png) + +### Create an API Gateway + +Go to the [API Gateway console](https://console.aws.amazon.com/apigateway/home), select "REST API" under "Choose an API type", and click "Build" + +![](https://user-images.githubusercontent.com/808475/78293216-18269e80-74dd-11ea-9e68-86922c2cbc7c.png) + +Select "REST" and "New API", name your API (e.g. "cortex"), select either "Regional" or "Edge optimized" (depending on your preference), and click "Create API" + +![](https://user-images.githubusercontent.com/808475/78293434-66d43880-74dd-11ea-92d6-692158171a3f.png) + +Select "Actions" > "Create Resource" + +![](https://user-images.githubusercontent.com/808475/80141938-3cffb600-855f-11ea-9c1c-132ca4503b7a.png) + +Select "Configure as proxy resource" and "Enable API Gateway CORS", and click "Create Resource" + +![](https://user-images.githubusercontent.com/808475/80142124-80f2bb00-855f-11ea-8e4e-9413146e0815.png) + +Select "VPC Link", select "Use Proxy Integration", choose your newly-created VPC Link, and set "Endpoint URL" to "http:///{proxy}". You can get your API load balancer endpoint via `cortex cluster info`; make sure to prepend `http://` and append `/{proxy}`. For example, mine is: `http://a5044e34a352d44b0945adcd455c7fa3-32fa161d3e5bcbf9.elb.us-west-2.amazonaws.com/{proxy}`. Click "Save" + +![](https://user-images.githubusercontent.com/808475/80147407-4f322200-8568-11ea-8ef5-df5164c1375f.png) + +Select "Actions" > "Deploy API" + +![](https://user-images.githubusercontent.com/808475/80147555-86083800-8568-11ea-86af-1b1e38c9d322.png) + +Create a new stage (e.g. "dev") and click "Deploy" + +![](https://user-images.githubusercontent.com/808475/80147631-a7692400-8568-11ea-8a09-13dbd50b17b9.png) + +Copy your "Invoke URL" + +![](https://user-images.githubusercontent.com/808475/80147716-c798e300-8568-11ea-9aef-7dd6fdf4a68a.png) + +### Use your new endpoint + +You may now use the "Invoke URL" in place of your API load balancer endpoint in your client. For example, this curl request: + +```bash +curl http://a5044e34a352d44b0945adcd455c7fa3-32fa161d3e5bcbf9.elb.us-west-2.amazonaws.com/hello-world -X POST -H "Content-Type: application/json" -d @sample.json +``` + +Would become: + +```bash +curl https://lrivodooqh.execute-api.us-west-2.amazonaws.com/dev/hello-world -X POST -H "Content-Type: application/json" -d @sample.json +``` + +### Cleanup + +Delete the API Gateway and VPC Link before spinning down your Cortex cluster: + +![](https://user-images.githubusercontent.com/808475/80149163-05970680-856b-11ea-9f82-61f4061a3321.png) + +![](https://user-images.githubusercontent.com/808475/80149204-1ba4c700-856b-11ea-83f7-9741c78b6b95.png) diff --git a/docs/clusters/networking/custom-domain.md b/docs/clusters/networking/custom-domain.md index f0eb8162c7..75c27cb967 100644 --- a/docs/clusters/networking/custom-domain.md +++ b/docs/clusters/networking/custom-domain.md @@ -1,10 +1,12 @@ # Custom domain -You can use any custom domain for your endpoints. For example, you can make your API accessible via `api.example.com/text-generator`. This guide will demonstrate how to create a dedicated subdomain in AWS Route 53 and, if desired, configure your API load balancer to use an SSL certificate provisioned by AWS Certificate Manager. +You can set up DNS to use a custom domain for your Cortex APIs. For example, you can make your API accessible via `api.example.com/hello-world`. + +This guide will demonstrate how to create a dedicated subdomain in AWS Route 53. After completing this guide, if you want to enable HTTPS with your custom subdomain, see [these instructions](https.md). ## Configure DNS -Decide on a subdomain that you want to dedicate to Cortex APIs. For example if your domain is `example.com`, a valid subdomain can be `api.example.com`. This guide will use `cortexlabs.dev` as the example domain and `api.cortexlabs.dev` as the subdomain. +Decide on a subdomain that you want to dedicate to Cortex APIs. For example if your domain is `example.com`, a valid subdomain can be `api.example.com`. This guide will use `cortexlabs.dev` as the domain and `api.cortexlabs.dev` as the subdomain. We will set up a hosted zone on Route 53 to manage the DNS records for the subdomain. Go to the [Route 53 console](https://console.aws.amazon.com/route53/home) and click "Hosted Zones". @@ -26,62 +28,6 @@ We are going to add an NS (name server) record that specifies that any traffic t ![](https://user-images.githubusercontent.com/808475/109039458-abcb0580-7681-11eb-8644-76436328687e.png) -## Generate an SSL certificate - -You can skip this section (and continue to [add the DNS record](#add-dns-record)) if you don't need an SSL certificate for your custom domain. If you don't use an SSL certificate, you will need to skip certificate verification when making HTTPS requests to your APIs (e.g. `curl -k https://***`), or make HTTP requests instead (e.g. `curl http://***`). - -To create an SSL certificate, go to the [ACM console](https://us-west-2.console.aws.amazon.com/acm/home) and click "Get Started" under the "Provision certificates" section. - -![](https://user-images.githubusercontent.com/4365343/82202340-c04ac800-98cf-11ea-9472-89dd6d67eb0d.png) - -Select "Request a public certificate" and then "Request a certificate". - -![](https://user-images.githubusercontent.com/4365343/82202654-3e0ed380-98d0-11ea-8c57-025f0b69c54f.png) - -Enter your subdomain and then click "Next". - -![](https://user-images.githubusercontent.com/4365343/82224652-1cbedf00-98f2-11ea-912b-466cee2f6e25.png) - -Select "DNS validation" and then click "Next". - -![](https://user-images.githubusercontent.com/4365343/82205311-66003600-98d4-11ea-90e3-da7e8b0b2b9c.png) - -Add tags for searchability (optional) then click "Review". - -![](https://user-images.githubusercontent.com/4365343/82206485-52ee6580-98d6-11ea-95a9-1d0ebafc178a.png) - -Click "Confirm and request". - -![](https://user-images.githubusercontent.com/4365343/82206602-84ffc780-98d6-11ea-9f2f-ce383404ec67.png) - -Click "Create record in Route 53". A popup will appear indicating that a Record is going to be added to Route 53. Click "Create" to automatically add the DNS record to your subdomain's hosted zone. Then click "Continue". - -![](https://user-images.githubusercontent.com/4365343/82223539-c8ffc600-98f0-11ea-93a2-044aa0c9670d.png) - -Wait for the Certificate Status to be "issued". This might take a few minutes. - -![](https://user-images.githubusercontent.com/4365343/82209663-a616e700-98db-11ea-95cb-c6efedadb942.png) - -Take note of the certificate's ARN. The certificate is ineligible for renewal because it is currently not being used. It will be eligible for renewal once it's used in Cortex. - -![](https://user-images.githubusercontent.com/4365343/82222684-9e613d80-98ef-11ea-98c0-5a20b457f062.png) - -Add the following field to your cluster configuration: - -```yaml -# cluster.yaml - -... - -ssl_certificate_arn: -``` - -Create a Cortex cluster: - -```bash -cortex cluster up cluster.yaml -``` - ## Add DNS record Navigate to your [EC2 Load Balancer console](https://us-west-2.console.aws.amazon.com/ec2/v2/home#LoadBalancers:sort=loadBalancerName) and locate the Cortex API load balancer. You can determine which is the API load balancer by inspecting the `kubernetes.io/service-name` tag. @@ -94,29 +40,14 @@ Go back to the [Route 53 console](https://console.aws.amazon.com/route53/home#ho ![](https://user-images.githubusercontent.com/808475/84083422-6ac97e80-a996-11ea-9679-be37268a2133.png) -## Use your new endpoint - -Wait a few minutes to allow the DNS changes to propagate. You may now use your subdomain in place of your API load balancer endpoint in your client. For example, this curl request: - -```bash -curl http://a5044e34a352d44b0945adcd455c7fa3-32fa161d3e5bcbf9.elb.us-west-2.amazonaws.com/text-generator -X POST -H "Content-Type: application/json" -d @sample.json -``` - -Would become: - -```bash -# add the `-k` flag or use http:// instead of https:// if you didn't configure an SSL certificate -curl https://api.cortexlabs.dev/text-generator -X POST -H "Content-Type: application/json" -d @sample.json -``` - ## Debugging connectivity issues -You could run into connectivity issues if you make a request to your API without waiting long enough for your DNS records to propagate after creating them (it usually takes 5-10 mintues). If you are updating existing DNS records, it could take anywhere from a few minutes to 48 hours for the DNS cache to expire (until then, your previous DNS configuration will be used). +You could run into connectivity issues if you make a request to your API without waiting long enough for your DNS records to propagate after creating them (it usually takes 5-10 minutes). If you are updating existing DNS records, it could take anywhere from a few minutes to 48 hours for the DNS cache to expire (until then, your previous DNS configuration will be used). To test connectivity, try the following steps: 1. Deploy an api. -1. Make a request to the your api (e.g. `curl https://api.cortexlabs.dev/my-api` or paste the url into your browser if your API supports GET requests). +1. Make a request to your api (e.g. `curl http://api.cortexlabs.dev/hello-world` or paste the url into your browser if your API supports GET requests). 1. If you run into an error such as `curl: (6) Could not resolve host: api.cortexlabs.dev` wait a few minutes and make the request from another device that hasn't made a request to that url in a while. ## Cleanup @@ -126,7 +57,3 @@ Spin down your Cortex cluster. Delete the hosted zone for your subdomain in the [Route 53 console](https://console.aws.amazon.com/route53/home#hosted-zones:): ![](https://user-images.githubusercontent.com/4365343/82228729-81306d00-98f7-11ea-8570-e9de15f5267f.png) - -If you created an SSL certificate, delete it from the [ACM console](https://us-west-2.console.aws.amazon.com/acm/home): - -![](https://user-images.githubusercontent.com/4365343/82228835-a624e000-98f7-11ea-92e2-cb4fb0f591e2.png) diff --git a/docs/clusters/networking/https.md b/docs/clusters/networking/https.md index 38a25da691..7cc60f49e0 100644 --- a/docs/clusters/networking/https.md +++ b/docs/clusters/networking/https.md @@ -1,152 +1,90 @@ -# HTTPS +# Setting up HTTPS -If you would like to support HTTPS endpoints for your Cortex APIs, here are a few options: +This guide shows how to support HTTPS traffic to Cortex APIs via a custom domain. It is also possible to use AWS API Gateway to enable HTTPS without using your own domain (see [here](api-gateway.md) for instructions). -* Custom domain with an SSL certificate: See [here](custom-domain.md) for instructions. -* AWS API Gateway: This is the simplest approach if a custom domain is not required; continue reading this guide for instructions. +In order to create a valid SSL certificate for your domain, you must have the ability to configure DNS to satisfy the DNS challenges which prove that you own the domain. This guide assumes that you are using a Route 53 hosted zone to manage a subdomain. Follow this [guide](./custom-domain.md) to set up a subdomain managed by a Route 53 hosted zone. -Please note that one limitation of API Gateway is that there is a 30-second time limit for all requests. +## Generate an SSL certificate -If your API load balancer is internet-facing (which is the default, or you set `api_load_balancer_scheme: internet-facing` in your cluster configuration file before creating your cluster), use the [first section](#internet-facing-load-balancer) of this guide. +To create an SSL certificate, go to the [ACM console](https://us-west-2.console.aws.amazon.com/acm/home) and click "Get Started" under the "Provision certificates" section. -If your API load balancer is internal (i.e. you set `api_load_balancer_scheme: internal` in your cluster configuration file before creating your cluster), use the [second section](#internal-load-balancer) of this guide. +![](https://user-images.githubusercontent.com/4365343/82202340-c04ac800-98cf-11ea-9472-89dd6d67eb0d.png) -## Internet-facing load balancer +Select "Request a public certificate" and then "Request a certificate". -_This section applies if your API load balancer is internet-facing (which is the default, or you set `api_load_balancer_scheme: internet-facing` in your cluster configuration file before creating your cluster). If your API load balancer is internal, see the [internal load balancer](#internal-load-balancer) section below._ +![](https://user-images.githubusercontent.com/4365343/82202654-3e0ed380-98d0-11ea-8c57-025f0b69c54f.png) -### Create an API Gateway +Enter your subdomain and then click "Next". -Go to the [API Gateway console](https://console.aws.amazon.com/apigateway/home), select "REST API" under "Choose an API type", and click "Build". +![](https://user-images.githubusercontent.com/4365343/82224652-1cbedf00-98f2-11ea-912b-466cee2f6e25.png) -![](https://user-images.githubusercontent.com/808475/78293216-18269e80-74dd-11ea-9e68-86922c2cbc7c.png) +Select "DNS validation" and then click "Next". -Select "REST" and "New API", name your API (e.g. "cortex"), select either "Regional" or "Edge optimized" (depending on your preference), and click "Create API". +![](https://user-images.githubusercontent.com/4365343/82205311-66003600-98d4-11ea-90e3-da7e8b0b2b9c.png) -![](https://user-images.githubusercontent.com/808475/78293434-66d43880-74dd-11ea-92d6-692158171a3f.png) +Add tags for searchability (optional) then click "Review". -Select "Actions" > "Create Resource": +![](https://user-images.githubusercontent.com/4365343/82206485-52ee6580-98d6-11ea-95a9-1d0ebafc178a.png) -![](https://user-images.githubusercontent.com/808475/80154502-8b6b7f80-8574-11ea-9c78-7d9f277bf55b.png) +Click "Confirm and request". -Select "Configure as proxy resource" and "Enable API Gateway CORS", and click "Create Resource" +![](https://user-images.githubusercontent.com/4365343/82206602-84ffc780-98d6-11ea-9f2f-ce383404ec67.png) -![](https://user-images.githubusercontent.com/808475/80154565-ad650200-8574-11ea-8753-808cd35902e2.png) +Click "Create record in Route 53". A popup will appear indicating that a Record is going to be added to Route 53. Click "Create" to automatically add the DNS record to your subdomain's hosted zone. Then click "Continue". -Select "HTTP Proxy" and set "Endpoint URL" to "http:///{proxy}". You can get your API load balancer endpoint via `cortex cluster info`; make sure to prepend `http://` and append `/{proxy}`. For example, mine is: `http://a9eaf69fd125947abb1065f62de59047-81cdebc0275f7d96.elb.us-west-2.amazonaws.com/{proxy}`. +![](https://user-images.githubusercontent.com/4365343/82223539-c8ffc600-98f0-11ea-93a2-044aa0c9670d.png) -Leave "Content Handling" set to "Passthrough" and Click "Save". +Wait for the Certificate Status to be "issued". This might take a few minutes. -![](https://user-images.githubusercontent.com/808475/80154735-13ea2000-8575-11ea-83ca-58f182df83c6.png) +![](https://user-images.githubusercontent.com/4365343/82209663-a616e700-98db-11ea-95cb-c6efedadb942.png) -Select "Actions" > "Deploy API" +Take note of the certificate's ARN. The certificate is ineligible for renewal because it is currently not being used. It will be eligible for renewal once it's used in Cortex. -![](https://user-images.githubusercontent.com/808475/80154802-2c5a3a80-8575-11ea-9ab3-de89885fd658.png) +![](https://user-images.githubusercontent.com/4365343/82222684-9e613d80-98ef-11ea-98c0-5a20b457f062.png) -Create a new stage (e.g. "dev") and click "Deploy" +## Create or update your cluster -![](https://user-images.githubusercontent.com/808475/80154859-4431be80-8575-11ea-9305-50384b1f9847.png) +Add the following field to your cluster configuration: -Copy your "Invoke URL" +```yaml +# cluster.yaml -![](https://user-images.githubusercontent.com/808475/80154911-5dd30600-8575-11ea-9682-1a7328783011.png) +... -### Use your new endpoint +ssl_certificate_arn: +``` -You may now use the "Invoke URL" in place of your API load balancer endpoint in your client. For example, this curl request: +Create a cluster: ```bash -curl http://a9eaf69fd125947abb1065f62de59047-81cdebc0275f7d96.elb.us-west-2.amazonaws.com/my-api -X POST -H "Content-Type: application/json" -d @sample.json +cortex cluster up cluster.yaml ``` -Would become: +Or update an existing cluster: ```bash -curl https://31qjv48rs6.execute-api.us-west-2.amazonaws.com/dev/my-api -X POST -H "Content-Type: application/json" -d @sample.json +cortex cluster configure cluster.yaml ``` -### Cleanup - -Delete the API Gateway before spinning down your Cortex cluster: - -![](https://user-images.githubusercontent.com/808475/80155073-bdc9ac80-8575-11ea-99a1-95c0579da79e.png) - -## Internal load balancer - -_This section applies if your API load balancer is internal (i.e. you set `api_load_balancer_scheme: internal` in your cluster configuration file before creating your cluster). If your API load balancer is internet-facing, see the [internet-facing load balancer](#internet-facing-load-balancer) section above._ - -### Create a VPC Link - -Navigate to AWS's EC2 Load Balancer dashboard and locate the Cortex API load balancer. You can determine which is the API load balancer by inspecting the `kubernetes.io/service-name` tag: - -![](https://user-images.githubusercontent.com/808475/80142777-961c1980-8560-11ea-9202-40964dbff5e9.png) - -Take note of the load balancer's name. - -Go to the [API Gateway console](https://console.aws.amazon.com/apigateway/home), click "VPC Links" on the left sidebar, and click "Create" - -![](https://user-images.githubusercontent.com/808475/80142466-0c6c4c00-8560-11ea-8293-eb5e5572b797.png) - -Select "VPC link for REST APIs", name your VPC link (e.g. "cortex"), select the API load balancer, and click "Create". - -![](https://user-images.githubusercontent.com/808475/80143027-03c84580-8561-11ea-92de-9ed0a5dfa593.png) - -Wait for the VPC link to be created (it will take a few minutes) - -![](https://user-images.githubusercontent.com/808475/80144088-bbaa2280-8562-11ea-901b-8520eb253df7.png) - -### Create an API Gateway - -Go to the [API Gateway console](https://console.aws.amazon.com/apigateway/home), select "REST API" under "Choose an API type", and click "Build" - -![](https://user-images.githubusercontent.com/808475/78293216-18269e80-74dd-11ea-9e68-86922c2cbc7c.png) - -Select "REST" and "New API", name your API (e.g. "cortex"), select either "Regional" or "Edge optimized" (depending on your preference), and click "Create API" - -![](https://user-images.githubusercontent.com/808475/78293434-66d43880-74dd-11ea-92d6-692158171a3f.png) - -Select "Actions" > "Create Resource" - -![](https://user-images.githubusercontent.com/808475/80141938-3cffb600-855f-11ea-9c1c-132ca4503b7a.png) - -Select "Configure as proxy resource" and "Enable API Gateway CORS", and click "Create Resource" - -![](https://user-images.githubusercontent.com/808475/80142124-80f2bb00-855f-11ea-8e4e-9413146e0815.png) - -Select "VPC Link", select "Use Proxy Integration", choose your newly-created VPC Link, and set "Endpoint URL" to "http:///{proxy}". You can get your API load balancer endpoint via `cortex cluster info`; make sure to prepend `http://` and append `/{proxy}`. For example, mine is: `http://a5044e34a352d44b0945adcd455c7fa3-32fa161d3e5bcbf9.elb.us-west-2.amazonaws.com/{proxy}`. Click "Save" - -![](https://user-images.githubusercontent.com/808475/80147407-4f322200-8568-11ea-8ef5-df5164c1375f.png) - -Select "Actions" > "Deploy API" - -![](https://user-images.githubusercontent.com/808475/80147555-86083800-8568-11ea-86af-1b1e38c9d322.png) - -Create a new stage (e.g. "dev") and click "Deploy" - -![](https://user-images.githubusercontent.com/808475/80147631-a7692400-8568-11ea-8a09-13dbd50b17b9.png) - -Copy your "Invoke URL" - -![](https://user-images.githubusercontent.com/808475/80147716-c798e300-8568-11ea-9aef-7dd6fdf4a68a.png) - -### Use your new endpoint +## Use your new endpoint -You may now use the "Invoke URL" in place of your API load balancer endpoint in your client. For example, this curl request: +Wait a few minutes to allow the DNS changes to propagate. You may now use your subdomain in place of your API load balancer endpoint in your client. For example, this curl request: ```bash -curl http://a5044e34a352d44b0945adcd455c7fa3-32fa161d3e5bcbf9.elb.us-west-2.amazonaws.com/my-api -X POST -H "Content-Type: application/json" -d @sample.json +curl http://a5044e34a352d44b0945adcd455c7fa3-32fa161d3e5bcbf9.elb.us-west-2.amazonaws.com/hello-world -X POST -H "Content-Type: application/json" -d @sample.json ``` Would become: ```bash -curl https://lrivodooqh.execute-api.us-west-2.amazonaws.com/dev/my-api -X POST -H "Content-Type: application/json" -d @sample.json +# add the `-k` flag or use http:// instead of https:// if you didn't configure an SSL certificate +curl https://api.cortexlabs.dev/hello-world -X POST -H "Content-Type: application/json" -d @sample.json ``` -### Cleanup +## Cleanup -Delete the API Gateway and VPC Link before spinning down your Cortex cluster: +Spin down your Cortex cluster. -![](https://user-images.githubusercontent.com/808475/80149163-05970680-856b-11ea-9f82-61f4061a3321.png) +If you created an SSL certificate, delete it from the [ACM console](https://us-west-2.console.aws.amazon.com/acm/home): -![](https://user-images.githubusercontent.com/808475/80149204-1ba4c700-856b-11ea-83f7-9741c78b6b95.png) +![](https://user-images.githubusercontent.com/4365343/82228835-a624e000-98f7-11ea-92e2-cb4fb0f591e2.png) diff --git a/docs/clusters/networking/load-balancers.md b/docs/clusters/networking/load-balancers.md index ad76bc7445..2d0ff5861f 100644 --- a/docs/clusters/networking/load-balancers.md +++ b/docs/clusters/networking/load-balancers.md @@ -4,6 +4,6 @@ All APIs share a single API load balancer. By default, the API load balancer is public. You can configure your API load balancer to be private by setting `api_load_balancer_scheme: internal` in your cluster configuration file (before creating your cluster). This will make your API only accessible through [VPC Peering](vpc-peering.md). You can enforce that incoming requests to APIs must originate from specific ip address ranges by specifying `api_load_balancer_cidr_white_list: []` in your cluster configuration. -The SSL certificate on the API load balancer is autogenerated during installation using `localhost` as the Common Name (CN). Therefore, clients will need to skip certificate verification when making HTTPS requests to your APIs (e.g. `curl -k https://***`), or make HTTP requests instead (e.g. `curl http://***`). Alternatively, you can enable HTTPS by using a [custom domain](custom-domain.md) or by [creating an API Gateway](https.md) to forward requests to your API load balancer. +The SSL certificate on the API load balancer is autogenerated during installation using `localhost` as the Common Name (CN). Therefore, clients will need to skip certificate verification when making HTTPS requests to your APIs (e.g. `curl -k https://***`), or make HTTP requests instead (e.g. `curl http://***`). Alternatively, you can enable HTTPS by using a [custom domain](custom-domain.md) and setting up [https](https.md) or by [creating an API Gateway](api-gateway.md) to forward requests to your API load balancer. There is a separate load balancer for the Cortex operator. By default, the operator load balancer is public. You can configure your operator load balancer to be private by setting `operator_load_balancer_scheme: internal` in your cluster configuration file (before creating your cluster). You can use [VPC Peering](vpc-peering.md) to enable your Cortex CLI to connect to your cluster operator from another VPC. You can enforce that incoming requests to the Cortex operator must originate from specific ip address ranges by specifying `operator_load_balancer_cidr_white_list: []` in your cluster configuration. diff --git a/docs/clusters/observability/alerting.md b/docs/clusters/observability/alerting.md index dbb07f51ac..4a286034e1 100644 --- a/docs/clusters/observability/alerting.md +++ b/docs/clusters/observability/alerting.md @@ -117,7 +117,7 @@ Due to how Grafana was built, you'll need to re-do the steps of setting a given ## Enabling email alerts -It is possible to manually configure SMTP to enable email alerts (we plan on automating this proccess, see [#2210](https://github.com/cortexlabs/cortex/issues/2210)). +It is possible to manually configure SMTP to enable email alerts (we plan on automating this process, see [#2210](https://github.com/cortexlabs/cortex/issues/2210)). **Step 1** diff --git a/docs/clusters/observability/logging.md b/docs/clusters/observability/logging.md index 1876f520b9..9cf633ed44 100644 --- a/docs/clusters/observability/logging.md +++ b/docs/clusters/observability/logging.md @@ -76,7 +76,7 @@ You can export both the Cortex system logs and your application logs to your des ### Configure kubectl -Follow these [instructions](../../clusters/advanced/kubectl.md) to set up kubectl. +Follow these [instructions](../advanced/kubectl.md) to set up kubectl. ### Find supported destinations in FluentBit diff --git a/docs/clusters/observability/metrics.md b/docs/clusters/observability/metrics.md index 0afe34a2ce..9059e338e6 100644 --- a/docs/clusters/observability/metrics.md +++ b/docs/clusters/observability/metrics.md @@ -77,7 +77,7 @@ The steps for exporting metrics from Prometheus will vary based on your monitori ### Configure kubectl -Follow these [instructions](../../clusters/advanced/kubectl.md) to set up kubectl. +Follow these [instructions](../advanced/kubectl.md) to set up kubectl. ### Install agent @@ -121,7 +121,7 @@ Once you've found an adapter that works for you, follow the steps below: ### Configure kubectl -Follow these [instructions](../../clusters/advanced/kubectl.md) to set up kubectl. +Follow these [instructions](../advanced/kubectl.md) to set up kubectl. ### Update Prometheus diff --git a/docs/start.md b/docs/start.md index 3941c0af9d..c57e309181 100644 --- a/docs/start.md +++ b/docs/start.md @@ -2,9 +2,10 @@ ## Create a cluster on your AWS account + ```bash # install the CLI -pip install cortex +bash -c "$(curl -sS https://raw.githubusercontent.com/cortexlabs/cortex/v0.38.0/get-cli.sh)" # create a cluster cortex cluster up cluster.yaml diff --git a/docs/summary.md b/docs/summary.md index 9238e4b432..c99bf03353 100644 --- a/docs/summary.md +++ b/docs/summary.md @@ -11,6 +11,7 @@ * [Update](clusters/management/update.md) * [Delete](clusters/management/delete.md) * [Environments](clusters/management/environments.md) + * [Production Guide](clusters/management/production.md) * Instances * [Multi-instance](clusters/instances/multi.md) * [Spot instances](clusters/instances/spot.md) @@ -20,9 +21,10 @@ * [Alerting](clusters/observability/alerting.md) * Networking * [Load balancers](clusters/networking/load-balancers.md) - * [VPC peering](clusters/networking/vpc-peering.md) - * [HTTPS](clusters/networking/https.md) * [Custom domain](clusters/networking/custom-domain.md) + * [HTTPS](clusters/networking/https.md) + * [HTTPS with API Gateway](clusters/networking/api-gateway.md) + * [VPC peering](clusters/networking/vpc-peering.md) * Advanced * [Setting up kubectl](clusters/advanced/kubectl.md) * [Private Docker registry](clusters/advanced/registry.md) @@ -43,6 +45,7 @@ * [Example](workloads/async/example.md) * [Configuration](workloads/async/configuration.md) * [Containers](workloads/async/containers.md) + * [Autoscaling](workloads/async/autoscaling.md) * [Statuses](workloads/async/statuses.md) * [Batch](workloads/batch/batch.md) * [Example](workloads/batch/example.md) diff --git a/docs/workloads/async/autoscaling.md b/docs/workloads/async/autoscaling.md index a747c0ecab..036f675dc2 100644 --- a/docs/workloads/async/autoscaling.md +++ b/docs/workloads/async/autoscaling.md @@ -48,13 +48,13 @@ For example, setting `target_in_flight` to 1 (the default) causes the cluster to
-**`upscale_tolerance`** (default: 0.05): Any recommendation falling within this factor above the current number of replicas will not trigger a scale up event. For example, if `upscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 21 or 22 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size. +**`upscale_tolerance`** (default: 0.05): Any recommendation falling within this factor above the current number of replicas will not trigger a scale-up event. For example, if `upscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 21 or 22 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size.
## Autoscaling instances -Cortex spins up and down instances based on the aggregate resource requests of all APIs. The number of instances will be at least `min_instances` and no more than `max_instances` for each node group (configured during installation and modifiable via `cortex cluster scale`). +Cortex spins up and down instances based on the aggregate resource requests of all APIs. The number of instances will be at least `min_instances` and no more than `max_instances` for each node group (configured during installation and modifiable via `cortex cluster configure`). ## Overprovisioning diff --git a/docs/workloads/async/configuration.md b/docs/workloads/async/configuration.md index b75be8023a..2ff686820f 100644 --- a/docs/workloads/async/configuration.md +++ b/docs/workloads/async/configuration.md @@ -52,7 +52,7 @@ max_downscale_factor: # maximum factor by which to scale down the API on a single scaling event (default: 0.75) max_upscale_factor: # maximum factor by which to scale up the API on a single scaling event (default: 1.5) downscale_tolerance: # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.05) - upscale_tolerance: # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.05) + upscale_tolerance: # any recommendation falling within this factor above the current number of replicas will not trigger a scale-up event (default: 0.05) node_groups: # a list of node groups on which this API can run (default: all node groups are eligible) update_strategy: # deployment strategy to use when replacing existing replicas with new ones (default: see below) max_surge: # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%) (set to 0 to disable rolling updates) diff --git a/docs/workloads/async/containers.md b/docs/workloads/async/containers.md index 8f557ad11c..f7b37ee158 100644 --- a/docs/workloads/async/containers.md +++ b/docs/workloads/async/containers.md @@ -45,19 +45,19 @@ Note: your Cortex CLI or client must match the version of your cluster (availabl It is possible to submit requests to Async APIs from any Cortex API within a Cortex cluster. Requests can be made to `http://ingressgateway-apis.istio-system.svc.cluster.local/`, where `` is the name of the Async API you are making a request to. -For example, if there is an Async API named `my-api` running in the cluster, you can make a request to it from a different API in Python by using: +For example, if there is an Async API named `hello-world` running in the cluster, you can make a request to it from a different API in Python by using: ```python import requests # make a request to an Async API response = requests.post( - "http://ingressgateway-apis.istio-system.svc.cluster.local/my-api", + "http://ingressgateway-apis.istio-system.svc.cluster.local/hello-world", json={"text": "hello world"}, ) # retreive a result from an Async API -response = requests.get("http://ingressgateway-apis.istio-system.svc.cluster.local/my-api/") +response = requests.get("http://ingressgateway-apis.istio-system.svc.cluster.local/hello-world/") ``` To make requests from your Async API to a Realtime, Batch, or Task API running within the cluster, see the "Chaining APIs" docs associated with the target workload type. diff --git a/docs/workloads/batch/containers.md b/docs/workloads/batch/containers.md index 67263f4b33..9d66899e83 100644 --- a/docs/workloads/batch/containers.md +++ b/docs/workloads/batch/containers.md @@ -51,7 +51,7 @@ Note: your Cortex CLI or client must match the version of your cluster (availabl It is possible to submit Batch jobs from any Cortex API within a Cortex cluster. Jobs can be submitted to `http://ingressgateway-operator.istio-system.svc.cluster.local/batch/`, where `` is the name of the Batch API you are making a request to. -For example, if there is a Batch API named `my-api` running in the cluster, you can make a request to it from a different API in Python by using: +For example, if there is a Batch API named `hello-world` running in the cluster, you can make a request to it from a different API in Python by using: ```python import requests @@ -63,7 +63,7 @@ job_spec = { } response = requests.post( - "http://ingressgateway-operator.istio-system.svc.cluster.local/batch/my-api", + "http://ingressgateway-operator.istio-system.svc.cluster.local/batch/hello-world", json=job_spec, ) ``` diff --git a/docs/workloads/realtime/autoscaling.md b/docs/workloads/realtime/autoscaling.md index ead284984a..cd38639875 100644 --- a/docs/workloads/realtime/autoscaling.md +++ b/docs/workloads/realtime/autoscaling.md @@ -60,13 +60,13 @@ For example, setting `target_in_flight` to `max_concurrency` (the default) cause
-**`upscale_tolerance`** (default: 0.05): Any recommendation falling within this factor above the current number of replicas will not trigger a scale up event. For example, if `upscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 21 or 22 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size. +**`upscale_tolerance`** (default: 0.05): Any recommendation falling within this factor above the current number of replicas will not trigger a scale-up event. For example, if `upscale_tolerance` is 0.1 and there are 20 running replicas, a recommendation of 21 or 22 replicas will not be acted on, and the API will remain at 20 replicas. Increasing this value will prevent thrashing, but setting it too high will prevent the cluster from maintaining it's optimal size.
## Autoscaling instances -Cortex spins up and down instances based on the aggregate resource requests of all APIs. The number of instances will be at least `min_instances` and no more than `max_instances` for each node group (configured during installation and modifiable via `cortex cluster scale`). +Cortex spins up and down instances based on the aggregate resource requests of all APIs. The number of instances will be at least `min_instances` and no more than `max_instances` for each node group (configured during installation and modifiable via `cortex cluster configure`). ## Overprovisioning diff --git a/docs/workloads/realtime/configuration.md b/docs/workloads/realtime/configuration.md index dd6ea83164..febbc45f16 100644 --- a/docs/workloads/realtime/configuration.md +++ b/docs/workloads/realtime/configuration.md @@ -56,7 +56,7 @@ max_downscale_factor: # maximum factor by which to scale down the API on a single scaling event (default: 0.75) max_upscale_factor: # maximum factor by which to scale up the API on a single scaling event (default: 1.5) downscale_tolerance: # any recommendation falling within this factor below the current number of replicas will not trigger a scale down event (default: 0.05) - upscale_tolerance: # any recommendation falling within this factor above the current number of replicas will not trigger a scale up event (default: 0.05) + upscale_tolerance: # any recommendation falling within this factor above the current number of replicas will not trigger a scale-up event (default: 0.05) node_groups: # a list of node groups on which this API can run (default: all node groups are eligible) update_strategy: # deployment strategy to use when replacing existing replicas with new ones (default: see below) max_surge: # maximum number of replicas that can be scheduled above the desired number of replicas during an update; can be an absolute number, e.g. 5, or a percentage of desired replicas, e.g. 10% (default: 25%) (set to 0 to disable rolling updates) diff --git a/docs/workloads/realtime/containers.md b/docs/workloads/realtime/containers.md index f700b54cee..0ac3869aac 100644 --- a/docs/workloads/realtime/containers.md +++ b/docs/workloads/realtime/containers.md @@ -4,7 +4,7 @@ In order to handle requests to your Realtime API, one of your containers must run a web server which is listening for HTTP requests on the port which is configured in the `pod.port` field of your [API configuration](configuration.md) (default: 8080). -Subpaths are supported; for example, if your API is named `my-api`, a request to `/my-api` will be routed to the root (`/`) of your web server, and a request to `/my-api/subpatch` will be routed to `/subpath` on your web server. +Subpaths are supported; for example, if your API is named `hello-world`, a request to `/hello-world` will be routed to the root (`/`) of your web server, and a request to `/hello-world/subpatch` will be routed to `/subpath` on your web server. ## Readiness checks @@ -43,13 +43,13 @@ Note: your Cortex CLI or client must match the version of your cluster (availabl It is possible to make requests to Realtime APIs from any Cortex API within a Cortex cluster. Requests can be made to `http://ingressgateway-apis.istio-system.svc.cluster.local/`, where `` is the name of the Realtime API you are making a request to. -For example, if there is a Realtime API named `my-api` running in the cluster, you can make a request to it from a different API in Python by using: +For example, if there is a Realtime API named `hello-world` running in the cluster, you can make a request to it from a different API in Python by using: ```python import requests response = requests.post( - "http://ingressgateway-apis.istio-system.svc.cluster.local/my-api", + "http://ingressgateway-apis.istio-system.svc.cluster.local/hello-world", json={"text": "hello world"}, ) ``` diff --git a/docs/workloads/realtime/troubleshooting.md b/docs/workloads/realtime/troubleshooting.md index b82ed7660a..61de9dfe74 100644 --- a/docs/workloads/realtime/troubleshooting.md +++ b/docs/workloads/realtime/troubleshooting.md @@ -23,7 +23,7 @@ When you created your Cortex cluster, you configured `max_instances` for each no You can check the current value of `max_instances` for the selected node group by running `cortex cluster info --config cluster.yaml` (or `cortex cluster info --name --region ` if you have the name and region of the cluster). -Once you have the name and region of the cluster, you can update `max_instances` by specifying the desired number of `max_instances` for your node group with `cortex cluster scale --name --region --node-group --max-instances `. +Once you have the name and region of the cluster, you can update the `max_instances` field by following the [instructions](../../clusters/management/update.md) to update an existing cluster. ## Check your AWS auto scaling group activity history @@ -61,7 +61,7 @@ If you're running in a development environment, this rolling update behavior can You can disable rolling updates for your API in your API configuration: set `max_surge` to 0 in the `update_strategy` section, E.g.: ```yaml -- name: my-api +- name: hello-world kind: RealtimeAPI # ... update_strategy: diff --git a/docs/workloads/task/containers.md b/docs/workloads/task/containers.md index 3b53eb35f2..5d5db56d3f 100644 --- a/docs/workloads/task/containers.md +++ b/docs/workloads/task/containers.md @@ -22,13 +22,13 @@ Note: your Cortex CLI or client must match the version of your cluster (availabl It is possible to submit Task jobs from any Cortex API within a Cortex cluster. Jobs can be submitted to `http://ingressgateway-operator.istio-system.svc.cluster.local/tasks/`, where `` is the name of the Task API you are making a request to. -For example, if there is a Task API named `my-api` running in the cluster, you can make a request to it from a different API in Python by using: +For example, if there is a Task API named `hello-world` running in the cluster, you can make a request to it from a different API in Python by using: ```python import requests response = requests.post( - "http://ingressgateway-operator.istio-system.svc.cluster.local/tasks/my-api", + "http://ingressgateway-operator.istio-system.svc.cluster.local/tasks/hello-world", json={"config": {"my_key": "my_value"}}, ) ```