@@ -10,14 +10,16 @@ This directory contains a complete implementation for deploying to AWS ECS Farga
1010
1111- **Complete Terraform Configuration**: Full IaC setup with modular architecture
1212- **Automated Deployment Script**: One-command deployment with `deploy.sh`
13- - **CI/CD Pipeline**: GitHub Actions workflow for automated deployments
13+ - **CI/CD Pipeline**: GitHub Actions workflow with automatic rollback
1414- **Networking**: VPC with public/private subnets, NAT gateways, and VPC endpoints
1515- **Database Options**: RDS PostgreSQL or Aurora Serverless v2
16- - **Storage**: S3 buckets for static/media files with CDN-ready configuration
17- - **Security**: Secrets Manager, IAM roles, security groups
18- - **Monitoring**: CloudWatch logs, metrics, and alarms
19- - **Auto-scaling**: Target tracking and scheduled scaling
20- - **Cost Optimization**: Fargate Spot support, right-sizing recommendations
16+ - **Storage**: S3 buckets for static/media files with encryption
17+ - **Security**: Secrets Manager, IAM roles, security groups, SSL/TLS certificates
18+ - **Monitoring**: CloudWatch logs, alarms, SNS notifications, and dashboard
19+ - **Auto-scaling**: Multi-metric target tracking (CPU, Memory, Requests)
20+ - **DNS Management**: Route53 hosted zone and automatic certificate validation
21+ - **Migration Management**: Terraform-managed database migration tasks
22+ - **Cost Optimization**: Fargate Spot support, single NAT for non-prod, lifecycle policies
2123
2224### 📁 Directory Structure
2325
2931│ ├── variables.tf # Input variables
3032│ ├── outputs.tf # Output values
3133│ ├── network.tf # VPC and networking
32- │ ├── ecs.tf # ECS cluster and services
34+ │ ├── ecs.tf # ECS cluster, services, and auto-scaling
3335│ ├── alb.tf # Application Load Balancer
3436│ ├── database.tf # RDS/Aurora configuration
3537│ ├── storage.tf # S3 and Redis
36- │ ├── security.tf # Security groups and secrets
38+ │ ├── security.tf # Security groups, secrets, and Route53
3739│ ├── ecr.tf # Container registry
40+ │ ├── monitoring.tf # CloudWatch alarms and dashboard
3841│ └── terraform.tfvars.example # Example variables file
3942└── .github-workflows-ecs.yml # GitHub Actions CI/CD
4043```
4750- jq for JSON processing
4851- Python {{ python_version }} with Django installed
4952
53+ ## HTTPS and Custom Domain
54+
55+ **Important:** HTTPS is only available when using a custom domain. The deployment supports two modes:
56+
57+ 1. **HTTP-only (Development/Testing)**
58+ - Set `domain_name = ""` in terraform.tfvars
59+ - Access via ALB DNS name (e.g., `http://myapp-alb-123456.us-east-1.elb.amazonaws.com`)
60+ - No SSL/TLS certificate required
61+
62+ 2. **HTTPS with Custom Domain (Production)**
63+ - Set `domain_name = "example.com"` and `create_dns_zone = true` in terraform.tfvars
64+ - Terraform creates Route53 hosted zone and ACM certificate
65+ - HTTP traffic automatically redirects to HTTPS
66+ - Update your domain's nameservers to point to Route53
67+
68+ **Note:** AWS ALB DNS names cannot be used with ACM certificates. For production deployments with HTTPS, a custom domain is required.
69+
5070## Quick Start
5171
5272### Option 1: Automated Deployment (Recommended)
@@ -232,15 +252,18 @@ The ECS task definition includes:
232252
233253### Auto Scaling
234254
235- Auto -scaling configuration :
255+ Comprehensive auto -scaling with multiple metrics :
236256
237- - **Min Tasks**: 2
238- - **Max Tasks**: 10
239- - **Target CPU**: 70%
240- - **Target Memory**: 80%
257+ - **Min Tasks**: 2 (production), 1 (dev/staging)
258+ - **Max Tasks**: 10 (production), 3 (dev/staging)
259+ - **CPU-based scaling**: Target 70% utilization
260+ - **Memory-based scaling**: Target 80% utilization
261+ - **Request-based scaling**: Target 1000 requests per target
241262- **Scale-in Cooldown**: 300s
242263- **Scale-out Cooldown**: 60s
243264
265+ The service uses target tracking policies for all three metrics and will scale based on whichever metric needs the most capacity. This ensures responsive scaling under various load patterns.
266+
244267## Database Options
245268
246269### RDS PostgreSQL
@@ -279,102 +302,227 @@ STATICFILES_STORAGE = "storages.backends.s3boto3.S3StaticStorage"
279302DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
280303```
281304
305+ ## DNS and Domain Configuration
306+
307+ ### Route53 Setup (Optional)
308+
309+ If using a custom domain, Terraform can manage your DNS:
310+
311+ **1. Configure domain in terraform.tfvars:**
312+ ```hcl
313+ domain_name = "example.com"
314+ create_dns_zone = true
315+ ```
316+
317+ **2. Terraform creates:**
318+ - Route53 hosted zone for your domain
319+ - ACM certificate with automatic DNS validation
320+ - A records pointing to your ALB (apex and www)
321+ - Certificate validation records
322+
323+ **3. Update your domain registrar:**
324+ ```bash
325+ # Get nameservers from Terraform output
326+ terraform output domain_nameservers
327+
328+ # Update your domain registrar (e.g., Namecheap, GoDaddy) with these nameservers:
329+ # ns-123.awsdns-12.com
330+ # ns-456.awsdns-45.net
331+ # ns-789.awsdns-78.org
332+ # ns-012.awsdns-01.co.uk
333+ ```
334+
335+ **4. Certificate validation:**
336+ - Automatic when using Route53 (Terraform creates validation records)
337+ - Manual if using external DNS (add CNAME records from ACM console)
338+
339+ **5. Access your application:**
340+ - `https://example.com` - Apex domain
341+ - `https://www.example.com` - WWW subdomain
342+ - Both automatically redirect HTTP → HTTPS
343+
344+ ### Using Existing DNS Provider
345+
346+ If you manage DNS externally:
347+
348+ ```hcl
349+ domain_name = "example.com"
350+ create_dns_zone = false # Don't create Route53 zone
351+ ```
352+
353+ Then manually create:
354+ - ACM certificate validation records (from AWS Console)
355+ - A record pointing to ALB DNS name (from `terraform output alb_dns_name`)
356+
282357## Migrations
283358
284- ### One-Time Migrations
359+ ### Terraform-Managed Migration Task
360+
361+ Database migrations are managed as a Terraform resource (`ecs_migrate` task definition). This ensures consistency with your application container and simplifies deployment.
285362
286- Run migrations as a one-off task:
363+ ### Running Migrations
287364
365+ **Using deploy.sh (Recommended):**
288366```bash
289- aws ecs run-task \
290- --cluster {{ project_slug }}-cluster \
291- --task-definition {{ project_slug }}-migrate \
292- --launch-type FARGATE \
293- --network-configuration "awsvpcConfiguration={subnets=[subnet-xxx],securityGroups=[sg-xxx],assignPublicIp=ENABLED}"
367+ ./deploy/ecs/deploy.sh
368+ # Then select option 5: "Run migrations"
294369```
295370
296- ### Automated Migrations
371+ The script will:
372+ 1. Get the migration task definition from Terraform
373+ 2. Run the task in your private subnets
374+ 3. Wait for completion and validate exit code
375+ 4. Show migration logs
297376
298- Use ECS Exec or create a migration task that runs before deployment.
377+ **Manual Execution:**
378+ ```bash
379+ # Get task definition ARN from Terraform
380+ cd deploy/ecs/terraform
381+ MIGRATE_TASK_DEF=$(terraform output -raw ecs_migrate_task_definition)
382+
383+ # Get subnet and security group IDs
384+ SUBNET_IDS=$(aws ec2 describe-subnets \
385+ --filters "Name=tag:Type,Values=Private" \
386+ --query "Subnets[*].SubnetId" --output text | tr '\t' ',')
387+ SECURITY_GROUP=$(aws ec2 describe-security-groups \
388+ --filters "Name=tag:Name,Values=*app-sg" \
389+ --query "SecurityGroups[0].GroupId" --output text)
390+
391+ # Run migration
392+ TASK_ARN=$(aws ecs run-task \
393+ --cluster {{ project_slug }}-{environment}-cluster \
394+ --task-definition "$MIGRATE_TASK_DEF" \
395+ --launch-type FARGATE \
396+ --network-configuration "awsvpcConfiguration={subnets=[$SUBNET_IDS],securityGroups=[$SECURITY_GROUP],assignPublicIp=DISABLED}" \
397+ --query 'tasks[0].taskArn' --output text)
299398
300- ## Monitoring
399+ # Wait for completion
400+ aws ecs wait tasks-stopped --cluster {{ project_slug }}-{environment}-cluster --tasks "$TASK_ARN"
301401
302- ### CloudWatch Metrics
402+ # Check exit code
403+ aws ecs describe-tasks \
404+ --cluster {{ project_slug }}-{environment}-cluster \
405+ --tasks "$TASK_ARN" \
406+ --query 'tasks[0].containers[0].exitCode'
407+ ```
303408
304- Key metrics to monitor:
409+ **In CI/CD:**
410+ Migrations run automatically before deployment in the GitHub Actions workflow with:
411+ - Proper wait conditions (no hardcoded sleeps)
412+ - Exit code validation
413+ - Automatic rollback on failure
305414
306- - CPU Utilization
307- - Memory Utilization
308- - Request Count
309- - Target Response Time
310- - Unhealthy Host Count
415+ ## Monitoring
311416
312417### CloudWatch Alarms
313418
314- Set up alarms for:
419+ Comprehensive monitoring with automated alerting (enabled via `enable_monitoring = true`):
420+
421+ **ECS Service Alarms:**
422+ - **High CPU**: > 80% for 2 periods (5 min each)
423+ - **High Memory**: > 85% for 2 periods
424+ - **Unhealthy Targets**: Any unhealthy targets for 2 periods (1 min each)
315425
316- - High CPU (> 80%)
317- - High Memory (> 85%)
318- - Failed Health Checks
319- - 5xx Error Rate
426+ **ALB Alarms:**
427+ - **5XX Errors**: > 10 errors in 5 minutes
428+ - **Response Time**: > 2 seconds average for 5 minutes
429+
430+ **Database Alarms (RDS):**
431+ - **High CPU**: > 80% for 10 minutes
432+ - **Low Storage**: < 5GB free space
433+ - **High Connections**: > 80 concurrent connections
434+
435+ {% if cache == 'redis' -%}
436+ **Redis Alarms:**
437+ - **High CPU**: > 75% for 10 minutes
438+ - **High Memory**: > 80% for 10 minutes
439+ {% endif %}
440+
441+ **Notifications:**
442+ - Email notifications via SNS (configure `alarm_email` in terraform.tfvars)
443+ - All alarms tagged with environment and project
444+ - Alarms automatically integrate with AWS CloudWatch dashboard
445+
446+ ### CloudWatch Dashboard
447+
448+ A pre-configured dashboard is created with:
449+ - ECS CPU and Memory utilization graphs
450+ - ALB request count, errors, and response time
451+ - Database connection count and CPU usage
452+ - All metrics updated in real-time
453+
454+ Access via: AWS Console → CloudWatch → Dashboards → `{{ project_slug }}-{environment}-dashboard`
320455
321456### Logs
322457
323458View logs:
324459
325460```bash
326461# Stream logs
327- aws logs tail /ecs/{{ project_slug }} --follow
462+ aws logs tail /ecs/{{ project_slug }}-{environment} --follow
328463
329464# Filter for errors
330465aws logs filter-log-events \
331- --log-group-name /ecs/{{ project_slug }} \
466+ --log-group-name /ecs/{{ project_slug }}-{environment} \
332467 --filter-pattern "ERROR"
468+
469+ # View specific service logs
470+ aws logs tail /ecs/{{ project_slug }}-{environment} --follow --filter-pattern "app"
471+ aws logs tail /ecs/{{ project_slug }}-{environment} --follow --filter-pattern "migrate"
472+ {% if use_celery -%}
473+ aws logs tail /ecs/{{ project_slug }}-{environment} --follow --filter-pattern "celery"
474+ {% endif -%}
333475```
334476
335477## CI/CD Integration
336478
337479### GitHub Actions
338480
339- ```yaml
340- name: Deploy to ECS
341-
342- on:
343- push:
344- branches: [main]
345-
346- jobs:
347- deploy:
348- runs-on: ubuntu-latest
349- steps:
350- - uses: actions/checkout@v3
351-
352- - name: Configure AWS credentials
353- uses: aws-actions/configure-aws-credentials@v2
354- with:
355- aws-access-key-id: {% raw %} ${{ secrets.AWS_ACCESS_KEY_ID }}{% endraw %}
356- aws-secret-access-key: {% raw %} ${{ secrets.AWS_SECRET_ACCESS_KEY }}{% endraw %}
357- aws-region: us-east-1
358-
359- - name: Login to Amazon ECR
360- id: login-ecr
361- uses: aws-actions/amazon-ecr-login@v1
362-
363- - name: Build and push image
364- env:
365- ECR_REGISTRY: {% raw %} ${{ steps.login-ecr.outputs.registry }}{% endraw %}
366- IMAGE_TAG: {% raw %} ${{ github.sha }}{% endraw %}
367- run: |
368- docker build -t $ECR_REGISTRY/{{ project_slug }}:$IMAGE_TAG .
369- docker push $ECR_REGISTRY/{{ project_slug }}:$IMAGE_TAG
370-
371- - name: Deploy to ECS
372- run: |
373- aws ecs update-service \
374- --cluster {{ project_slug }}-cluster \
375- --service {{ project_slug }}-service \
376- --force-new-deployment
481+ The included workflow (`.github-workflows-ecs.yml`) provides:
482+
483+ **Features:**
484+ - ✅ Automated testing on pull requests
485+ - ✅ Build and push to ECR with image caching
486+ - ✅ Separate staging and production deployments
487+ - ✅ Database migrations with validation
488+ - ✅ Health check verification with retries
489+ - ✅ **Automatic rollback on failure** (production only)
490+ - ✅ RDS snapshot before production deployment
491+ - ✅ Blue/Green deployment with circuit breaker
492+ - ✅ Slack notifications (optional)
493+ - ✅ Sentry release tracking (optional)
494+
495+ **Deployment Flow:**
496+
497+ 1. **Test** (PR only): Run pytest and linting
498+ 2. **Build**: Build Docker image with caching
499+ 3. **Push**: Tag and push to ECR (`:latest`, `:sha`, `:branch-name`)
500+ 4. **Backup**: Create RDS snapshot (production only)
501+ 5. **Migrate**: Run database migrations with exit code validation
502+ 6. **Deploy**: Update ECS service with new task definition
503+ 7. **Wait**: Use proper AWS wait conditions (no hardcoded sleeps)
504+ 8. **Verify**: Health checks with retry logic
505+ 9. **Rollback**: Automatic rollback on failure (production only)
506+ 10. **Notify**: Send notifications (Slack, Sentry)
507+
508+ **Configuration:**
509+
510+ Add these secrets to your GitHub repository:
377511```
512+ AWS_ACCESS_KEY_ID
513+ AWS_SECRET_ACCESS_KEY
514+ STAGING_SUBNET_IDS # Comma-separated subnet IDs
515+ STAGING_SECURITY_GROUP_ID
516+ PRODUCTION_SUBNET_IDS
517+ PRODUCTION_SECURITY_GROUP_ID
518+ SLACK_WEBHOOK # Optional
519+ SENTRY_AUTH_TOKEN # Optional
520+ ```
521+
522+ **Trigger:**
523+ - `main` branch → Deploy to staging
524+ - `production` branch → Deploy to production
525+ - Manual workflow dispatch → Choose environment
378526
379527## Cost Optimization
380528
0 commit comments