-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Describe the bug
Current Scenario
For the scaling policy of Queue processing fargate service, 2 parts are added -
- Queue length-based scaling - In this scenario, if the user has not provided a step-down counter the system auto-calculates that it does not need a scale-in and does not create a scale-in alarm.
- Cpu base scaling - In this scenario the scale-in and scale-out are now dependent on the avg CPU utilisation of the system.
This can be found here -
aws-cdk/packages/@aws-cdk/aws-ecs-patterns/lib/base/queue-processing-service-base.ts
Line 344 in fd5808f
protected configureAutoscalingForService(service: BaseService) { |
Issue
The CPU base scaling does not seem appropriate in a Queue processing fargate service, the fargate service should only scale out or in depending on the number of messages are there in the queue, not the CPU utilization of the system.
Because of the CPU-based scaling, the auto-scaling group may start a new instance that will process the same message again if there is a CPU-intensive process triggered by the message and is not completed within the scaling alarm trigger.
Also, if the process is memory intensive then the CPU-based scaling will always be in alarm causing the auto-scaling group to remove a task till it reaches the desired capacity.
These scenarios are also relevant for the memory utilization metric but the running task is actually CPU intensive.
Since there is no task-level termination protection, and disable scale-in feature is missing from the patterns this can cause the ASG to terminate a task that is mid-execution.
Expected Behavior
When a Queue processing fargate service has been set up to only scale-out on an approximate number of messages in the queue and the scale-in has been disabled it should not terminate the tasks.
Current Behavior
The ASG on Queue Processing fargate service starts terminating the task if the task is memory intensive and has a long processing time, because of a CW scale in alarm triggered from the CPUUtilizationMetric Scaling policy, thus terminating a random task mid-execution.
Reproduction Steps
Following CDK -
import { Stack, StackProps } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { QueueProcessingFargateService } from 'aws-cdk-lib/aws-ecs-patterns'
import { ContainerImage } from 'aws-cdk-lib/aws-ecs';
export class QueueProcessingFargateServiceAutoscaleTestStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
var containerImage = ContainerImage.fromAsset('test')
var service = new QueueProcessingFargateService(this, 'test', {
image: containerImage,
scalingSteps : [
{ upper: 0, change: 0 },{ lower: 1, change: +1 }
]
})
}
}
will create a new QueueProcessingFargateService with following type of scaling policy -
which causes conflicting alarms to be always trigger -
Possible Solution
The issue is with this method in the Queue processing fargate service pattern base -
aws-cdk/packages/@aws-cdk/aws-ecs-patterns/lib/base/queue-processing-service-base.ts
Line 344 in fd5808f
protected configureAutoscalingForService(service: BaseService) { |
It is adding a default CPUUtilizationScalingPolicy that cannot be removed, edited nor disabled.
Solution 1
Remove the CPU Utilization scaling factor if not necessarily required.
Solution 2
Add optional properties and let the user modify the value to disable scale-in on CPU utilization metric or let the user modify the values as per the user's will.
Additional Information/Context
No response
CDK CLI Version
2.27
Framework Version
No response
Node.js Version
16.14.2
OS
Linux
Language
Typescript
Language Version
3.9.7
Other information
No response