Skip to content

Pods stuck in ContainerCreating (AWS CNI pod limit) #219

Closed
@deliahu

Description

@deliahu

Description

Pod limits per node have been reached (17 for t3.medium, 11 for t3.small). The limits exist due to IP address allocation from the AWS cni plugin, see here. Once the limit is reached, cluster autoscaling is triggered, and pods that are scheduled on the new node get stuck in ContainerCreating.

Update: this was fixed in v1.5.2 of the AWS CNI.

Things that will fix:

Alternative CNI plugins

May need to run kubectl delete --namespace kube-system daemonset/aws-node before adding worker nodes to uninstall the AWS cni. May also need to start kubelet without --network-plugin=cni - otherwise kubelet may refuse to start because the configured CNI plugin cannot be brought up (aws-node container is not running). Another way to remove the AWS cni is to build a custom AMI with the desired CNI plugin prefixed with 00 instead of the standard 10 so that it circumvents the loading of the AWS VPC CNI plugin. source

Things that will help:

  • Increase default node size
  • Increase default CPU request (to reach CPU limits before pod limits)
  • Replace argo with custom DAG management

Metadata

Metadata

Assignees

Labels

blockedBlocked on another task or external eventbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions