Skip to content

aws-ecs-patterns (QueueProcessingFargateService): non-editable Scaling Policy causes race conditions & dropped tasks #20706

@AnuragMohapatra

Description

@AnuragMohapatra

Describe the bug

Current Scenario

For the scaling policy of Queue processing fargate service, 2 parts are added -

  1. Queue length-based scaling - In this scenario, if the user has not provided a step-down counter the system auto-calculates that it does not need a scale-in and does not create a scale-in alarm.
  2. Cpu base scaling - In this scenario the scale-in and scale-out are now dependent on the avg CPU utilisation of the system.

This can be found here -

protected configureAutoscalingForService(service: BaseService) {

Issue

The CPU base scaling does not seem appropriate in a Queue processing fargate service, the fargate service should only scale out or in depending on the number of messages are there in the queue, not the CPU utilization of the system.

Because of the CPU-based scaling, the auto-scaling group may start a new instance that will process the same message again if there is a CPU-intensive process triggered by the message and is not completed within the scaling alarm trigger.

Also, if the process is memory intensive then the CPU-based scaling will always be in alarm causing the auto-scaling group to remove a task till it reaches the desired capacity.

These scenarios are also relevant for the memory utilization metric but the running task is actually CPU intensive.

Since there is no task-level termination protection, and disable scale-in feature is missing from the patterns this can cause the ASG to terminate a task that is mid-execution.

Expected Behavior

When a Queue processing fargate service has been set up to only scale-out on an approximate number of messages in the queue and the scale-in has been disabled it should not terminate the tasks.

Current Behavior

The ASG on Queue Processing fargate service starts terminating the task if the task is memory intensive and has a long processing time, because of a CW scale in alarm triggered from the CPUUtilizationMetric Scaling policy, thus terminating a random task mid-execution.

Reproduction Steps

Following CDK -

  import { Stack, StackProps } from 'aws-cdk-lib';
  import { Construct } from 'constructs';
  import { QueueProcessingFargateService }  from 'aws-cdk-lib/aws-ecs-patterns'
  import { ContainerImage } from 'aws-cdk-lib/aws-ecs';
  
  export class QueueProcessingFargateServiceAutoscaleTestStack extends Stack {
    constructor(scope: Construct, id: string, props?: StackProps) {
      super(scope, id, props);  
      var containerImage = ContainerImage.fromAsset('test')  
      var service = new QueueProcessingFargateService(this, 'test', {
        image: containerImage,
        scalingSteps : [
          { upper: 0, change: 0 },{ lower: 1, change: +1 }
        ]
     })
    }
  }

will create a new QueueProcessingFargateService with following type of scaling policy -

image

which causes conflicting alarms to be always trigger -

image

image

Possible Solution

The issue is with this method in the Queue processing fargate service pattern base -

protected configureAutoscalingForService(service: BaseService) {

It is adding a default CPUUtilizationScalingPolicy that cannot be removed, edited nor disabled.

Solution 1

Remove the CPU Utilization scaling factor if not necessarily required.

Solution 2

Add optional properties and let the user modify the value to disable scale-in on CPU utilization metric or let the user modify the values as per the user's will.

Additional Information/Context

No response

CDK CLI Version

2.27

Framework Version

No response

Node.js Version

16.14.2

OS

Linux

Language

Typescript

Language Version

3.9.7

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-ecsRelated to Amazon Elastic Container@aws-cdk/aws-ecs-patternsRelated to ecs-patterns library@aws-cdk/aws-sqsRelated to Amazon Simple Queue ServicebugThis issue is a bug.p2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions