Skip to content

Commit c4d6849

Browse files
docs(ecs): Update README with latest instructions
1 parent d7e9335 commit c4d6849

File tree

1 file changed

+223
-75
lines changed
  • template/deploy/{% if 'aws-ecs-fargate' in deployment_targets %}ecs{% endif %}

1 file changed

+223
-75
lines changed

template/deploy/{% if 'aws-ecs-fargate' in deployment_targets %}ecs{% endif %}/README.md.jinja

Lines changed: 223 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,16 @@ This directory contains a complete implementation for deploying to AWS ECS Farga
1010

1111
- **Complete Terraform Configuration**: Full IaC setup with modular architecture
1212
- **Automated Deployment Script**: One-command deployment with `deploy.sh`
13-
- **CI/CD Pipeline**: GitHub Actions workflow for automated deployments
13+
- **CI/CD Pipeline**: GitHub Actions workflow with automatic rollback
1414
- **Networking**: VPC with public/private subnets, NAT gateways, and VPC endpoints
1515
- **Database Options**: RDS PostgreSQL or Aurora Serverless v2
16-
- **Storage**: S3 buckets for static/media files with CDN-ready configuration
17-
- **Security**: Secrets Manager, IAM roles, security groups
18-
- **Monitoring**: CloudWatch logs, metrics, and alarms
19-
- **Auto-scaling**: Target tracking and scheduled scaling
20-
- **Cost Optimization**: Fargate Spot support, right-sizing recommendations
16+
- **Storage**: S3 buckets for static/media files with encryption
17+
- **Security**: Secrets Manager, IAM roles, security groups, SSL/TLS certificates
18+
- **Monitoring**: CloudWatch logs, alarms, SNS notifications, and dashboard
19+
- **Auto-scaling**: Multi-metric target tracking (CPU, Memory, Requests)
20+
- **DNS Management**: Route53 hosted zone and automatic certificate validation
21+
- **Migration Management**: Terraform-managed database migration tasks
22+
- **Cost Optimization**: Fargate Spot support, single NAT for non-prod, lifecycle policies
2123

2224
### 📁 Directory Structure
2325

@@ -29,12 +31,13 @@ ecs/
2931
│ ├── variables.tf # Input variables
3032
│ ├── outputs.tf # Output values
3133
│ ├── network.tf # VPC and networking
32-
│ ├── ecs.tf # ECS cluster and services
34+
│ ├── ecs.tf # ECS cluster, services, and auto-scaling
3335
│ ├── alb.tf # Application Load Balancer
3436
│ ├── database.tf # RDS/Aurora configuration
3537
│ ├── storage.tf # S3 and Redis
36-
│ ├── security.tf # Security groups and secrets
38+
│ ├── security.tf # Security groups, secrets, and Route53
3739
│ ├── ecr.tf # Container registry
40+
│ ├── monitoring.tf # CloudWatch alarms and dashboard
3841
│ └── terraform.tfvars.example # Example variables file
3942
└── .github-workflows-ecs.yml # GitHub Actions CI/CD
4043
```
@@ -47,6 +50,23 @@ ecs/
4750
- jq for JSON processing
4851
- Python {{ python_version }} with Django installed
4952

53+
## HTTPS and Custom Domain
54+
55+
**Important:** HTTPS is only available when using a custom domain. The deployment supports two modes:
56+
57+
1. **HTTP-only (Development/Testing)**
58+
- Set `domain_name = ""` in terraform.tfvars
59+
- Access via ALB DNS name (e.g., `http://myapp-alb-123456.us-east-1.elb.amazonaws.com`)
60+
- No SSL/TLS certificate required
61+
62+
2. **HTTPS with Custom Domain (Production)**
63+
- Set `domain_name = "example.com"` and `create_dns_zone = true` in terraform.tfvars
64+
- Terraform creates Route53 hosted zone and ACM certificate
65+
- HTTP traffic automatically redirects to HTTPS
66+
- Update your domain's nameservers to point to Route53
67+
68+
**Note:** AWS ALB DNS names cannot be used with ACM certificates. For production deployments with HTTPS, a custom domain is required.
69+
5070
## Quick Start
5171

5272
### Option 1: Automated Deployment (Recommended)
@@ -232,15 +252,18 @@ The ECS task definition includes:
232252

233253
### Auto Scaling
234254

235-
Auto-scaling configuration:
255+
Comprehensive auto-scaling with multiple metrics:
236256

237-
- **Min Tasks**: 2
238-
- **Max Tasks**: 10
239-
- **Target CPU**: 70%
240-
- **Target Memory**: 80%
257+
- **Min Tasks**: 2 (production), 1 (dev/staging)
258+
- **Max Tasks**: 10 (production), 3 (dev/staging)
259+
- **CPU-based scaling**: Target 70% utilization
260+
- **Memory-based scaling**: Target 80% utilization
261+
- **Request-based scaling**: Target 1000 requests per target
241262
- **Scale-in Cooldown**: 300s
242263
- **Scale-out Cooldown**: 60s
243264

265+
The service uses target tracking policies for all three metrics and will scale based on whichever metric needs the most capacity. This ensures responsive scaling under various load patterns.
266+
244267
## Database Options
245268

246269
### RDS PostgreSQL
@@ -279,102 +302,227 @@ STATICFILES_STORAGE = "storages.backends.s3boto3.S3StaticStorage"
279302
DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
280303
```
281304

305+
## DNS and Domain Configuration
306+
307+
### Route53 Setup (Optional)
308+
309+
If using a custom domain, Terraform can manage your DNS:
310+
311+
**1. Configure domain in terraform.tfvars:**
312+
```hcl
313+
domain_name = "example.com"
314+
create_dns_zone = true
315+
```
316+
317+
**2. Terraform creates:**
318+
- Route53 hosted zone for your domain
319+
- ACM certificate with automatic DNS validation
320+
- A records pointing to your ALB (apex and www)
321+
- Certificate validation records
322+
323+
**3. Update your domain registrar:**
324+
```bash
325+
# Get nameservers from Terraform output
326+
terraform output domain_nameservers
327+
328+
# Update your domain registrar (e.g., Namecheap, GoDaddy) with these nameservers:
329+
# ns-123.awsdns-12.com
330+
# ns-456.awsdns-45.net
331+
# ns-789.awsdns-78.org
332+
# ns-012.awsdns-01.co.uk
333+
```
334+
335+
**4. Certificate validation:**
336+
- Automatic when using Route53 (Terraform creates validation records)
337+
- Manual if using external DNS (add CNAME records from ACM console)
338+
339+
**5. Access your application:**
340+
- `https://example.com` - Apex domain
341+
- `https://www.example.com` - WWW subdomain
342+
- Both automatically redirect HTTP → HTTPS
343+
344+
### Using Existing DNS Provider
345+
346+
If you manage DNS externally:
347+
348+
```hcl
349+
domain_name = "example.com"
350+
create_dns_zone = false # Don't create Route53 zone
351+
```
352+
353+
Then manually create:
354+
- ACM certificate validation records (from AWS Console)
355+
- A record pointing to ALB DNS name (from `terraform output alb_dns_name`)
356+
282357
## Migrations
283358

284-
### One-Time Migrations
359+
### Terraform-Managed Migration Task
360+
361+
Database migrations are managed as a Terraform resource (`ecs_migrate` task definition). This ensures consistency with your application container and simplifies deployment.
285362

286-
Run migrations as a one-off task:
363+
### Running Migrations
287364

365+
**Using deploy.sh (Recommended):**
288366
```bash
289-
aws ecs run-task \
290-
--cluster {{ project_slug }}-cluster \
291-
--task-definition {{ project_slug }}-migrate \
292-
--launch-type FARGATE \
293-
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx],assignPublicIp=ENABLED}"
367+
./deploy/ecs/deploy.sh
368+
# Then select option 5: "Run migrations"
294369
```
295370

296-
### Automated Migrations
371+
The script will:
372+
1. Get the migration task definition from Terraform
373+
2. Run the task in your private subnets
374+
3. Wait for completion and validate exit code
375+
4. Show migration logs
297376

298-
Use ECS Exec or create a migration task that runs before deployment.
377+
**Manual Execution:**
378+
```bash
379+
# Get task definition ARN from Terraform
380+
cd deploy/ecs/terraform
381+
MIGRATE_TASK_DEF=$(terraform output -raw ecs_migrate_task_definition)
382+
383+
# Get subnet and security group IDs
384+
SUBNET_IDS=$(aws ec2 describe-subnets \
385+
--filters "Name=tag:Type,Values=Private" \
386+
--query "Subnets[*].SubnetId" --output text | tr '\t' ',')
387+
SECURITY_GROUP=$(aws ec2 describe-security-groups \
388+
--filters "Name=tag:Name,Values=*app-sg" \
389+
--query "SecurityGroups[0].GroupId" --output text)
390+
391+
# Run migration
392+
TASK_ARN=$(aws ecs run-task \
393+
--cluster {{ project_slug }}-{environment}-cluster \
394+
--task-definition "$MIGRATE_TASK_DEF" \
395+
--launch-type FARGATE \
396+
--network-configuration "awsvpcConfiguration={subnets=[$SUBNET_IDS],securityGroups=[$SECURITY_GROUP],assignPublicIp=DISABLED}" \
397+
--query 'tasks[0].taskArn' --output text)
299398

300-
## Monitoring
399+
# Wait for completion
400+
aws ecs wait tasks-stopped --cluster {{ project_slug }}-{environment}-cluster --tasks "$TASK_ARN"
301401

302-
### CloudWatch Metrics
402+
# Check exit code
403+
aws ecs describe-tasks \
404+
--cluster {{ project_slug }}-{environment}-cluster \
405+
--tasks "$TASK_ARN" \
406+
--query 'tasks[0].containers[0].exitCode'
407+
```
303408

304-
Key metrics to monitor:
409+
**In CI/CD:**
410+
Migrations run automatically before deployment in the GitHub Actions workflow with:
411+
- Proper wait conditions (no hardcoded sleeps)
412+
- Exit code validation
413+
- Automatic rollback on failure
305414

306-
- CPU Utilization
307-
- Memory Utilization
308-
- Request Count
309-
- Target Response Time
310-
- Unhealthy Host Count
415+
## Monitoring
311416

312417
### CloudWatch Alarms
313418

314-
Set up alarms for:
419+
Comprehensive monitoring with automated alerting (enabled via `enable_monitoring = true`):
420+
421+
**ECS Service Alarms:**
422+
- **High CPU**: > 80% for 2 periods (5 min each)
423+
- **High Memory**: > 85% for 2 periods
424+
- **Unhealthy Targets**: Any unhealthy targets for 2 periods (1 min each)
315425

316-
- High CPU (> 80%)
317-
- High Memory (> 85%)
318-
- Failed Health Checks
319-
- 5xx Error Rate
426+
**ALB Alarms:**
427+
- **5XX Errors**: > 10 errors in 5 minutes
428+
- **Response Time**: > 2 seconds average for 5 minutes
429+
430+
**Database Alarms (RDS):**
431+
- **High CPU**: > 80% for 10 minutes
432+
- **Low Storage**: < 5GB free space
433+
- **High Connections**: > 80 concurrent connections
434+
435+
{% if cache == 'redis' -%}
436+
**Redis Alarms:**
437+
- **High CPU**: > 75% for 10 minutes
438+
- **High Memory**: > 80% for 10 minutes
439+
{% endif %}
440+
441+
**Notifications:**
442+
- Email notifications via SNS (configure `alarm_email` in terraform.tfvars)
443+
- All alarms tagged with environment and project
444+
- Alarms automatically integrate with AWS CloudWatch dashboard
445+
446+
### CloudWatch Dashboard
447+
448+
A pre-configured dashboard is created with:
449+
- ECS CPU and Memory utilization graphs
450+
- ALB request count, errors, and response time
451+
- Database connection count and CPU usage
452+
- All metrics updated in real-time
453+
454+
Access via: AWS Console → CloudWatch → Dashboards → `{{ project_slug }}-{environment}-dashboard`
320455

321456
### Logs
322457

323458
View logs:
324459

325460
```bash
326461
# Stream logs
327-
aws logs tail /ecs/{{ project_slug }} --follow
462+
aws logs tail /ecs/{{ project_slug }}-{environment} --follow
328463

329464
# Filter for errors
330465
aws logs filter-log-events \
331-
--log-group-name /ecs/{{ project_slug }} \
466+
--log-group-name /ecs/{{ project_slug }}-{environment} \
332467
--filter-pattern "ERROR"
468+
469+
# View specific service logs
470+
aws logs tail /ecs/{{ project_slug }}-{environment} --follow --filter-pattern "app"
471+
aws logs tail /ecs/{{ project_slug }}-{environment} --follow --filter-pattern "migrate"
472+
{% if use_celery -%}
473+
aws logs tail /ecs/{{ project_slug }}-{environment} --follow --filter-pattern "celery"
474+
{% endif -%}
333475
```
334476

335477
## CI/CD Integration
336478

337479
### GitHub Actions
338480

339-
```yaml
340-
name: Deploy to ECS
341-
342-
on:
343-
push:
344-
branches: [main]
345-
346-
jobs:
347-
deploy:
348-
runs-on: ubuntu-latest
349-
steps:
350-
- uses: actions/checkout@v3
351-
352-
- name: Configure AWS credentials
353-
uses: aws-actions/configure-aws-credentials@v2
354-
with:
355-
aws-access-key-id: {% raw %}${{ secrets.AWS_ACCESS_KEY_ID }}{% endraw %}
356-
aws-secret-access-key: {% raw %}${{ secrets.AWS_SECRET_ACCESS_KEY }}{% endraw %}
357-
aws-region: us-east-1
358-
359-
- name: Login to Amazon ECR
360-
id: login-ecr
361-
uses: aws-actions/amazon-ecr-login@v1
362-
363-
- name: Build and push image
364-
env:
365-
ECR_REGISTRY: {% raw %}${{ steps.login-ecr.outputs.registry }}{% endraw %}
366-
IMAGE_TAG: {% raw %}${{ github.sha }}{% endraw %}
367-
run: |
368-
docker build -t $ECR_REGISTRY/{{ project_slug }}:$IMAGE_TAG .
369-
docker push $ECR_REGISTRY/{{ project_slug }}:$IMAGE_TAG
370-
371-
- name: Deploy to ECS
372-
run: |
373-
aws ecs update-service \
374-
--cluster {{ project_slug }}-cluster \
375-
--service {{ project_slug }}-service \
376-
--force-new-deployment
481+
The included workflow (`.github-workflows-ecs.yml`) provides:
482+
483+
**Features:**
484+
- ✅ Automated testing on pull requests
485+
- ✅ Build and push to ECR with image caching
486+
- ✅ Separate staging and production deployments
487+
- ✅ Database migrations with validation
488+
- ✅ Health check verification with retries
489+
- ✅ **Automatic rollback on failure** (production only)
490+
- ✅ RDS snapshot before production deployment
491+
- ✅ Blue/Green deployment with circuit breaker
492+
- ✅ Slack notifications (optional)
493+
- ✅ Sentry release tracking (optional)
494+
495+
**Deployment Flow:**
496+
497+
1. **Test** (PR only): Run pytest and linting
498+
2. **Build**: Build Docker image with caching
499+
3. **Push**: Tag and push to ECR (`:latest`, `:sha`, `:branch-name`)
500+
4. **Backup**: Create RDS snapshot (production only)
501+
5. **Migrate**: Run database migrations with exit code validation
502+
6. **Deploy**: Update ECS service with new task definition
503+
7. **Wait**: Use proper AWS wait conditions (no hardcoded sleeps)
504+
8. **Verify**: Health checks with retry logic
505+
9. **Rollback**: Automatic rollback on failure (production only)
506+
10. **Notify**: Send notifications (Slack, Sentry)
507+
508+
**Configuration:**
509+
510+
Add these secrets to your GitHub repository:
377511
```
512+
AWS_ACCESS_KEY_ID
513+
AWS_SECRET_ACCESS_KEY
514+
STAGING_SUBNET_IDS # Comma-separated subnet IDs
515+
STAGING_SECURITY_GROUP_ID
516+
PRODUCTION_SUBNET_IDS
517+
PRODUCTION_SECURITY_GROUP_ID
518+
SLACK_WEBHOOK # Optional
519+
SENTRY_AUTH_TOKEN # Optional
520+
```
521+
522+
**Trigger:**
523+
- `main` branch → Deploy to staging
524+
- `production` branch → Deploy to production
525+
- Manual workflow dispatch → Choose environment
378526

379527
## Cost Optimization
380528

0 commit comments

Comments
 (0)