Description
Description
Pod limits per node have been reached (17 for t3.medium, 11 for t3.small). The limits exist due to IP address allocation from the AWS cni plugin, see here. Once the limit is reached, cluster autoscaling is triggered, and pods that are scheduled on the new node get stuck in ContainerCreating
.
Update: this was fixed in v1.5.2 of the AWS CNI.
Things that will fix:
- AWS fixing the bug in their CNI
- Race condition between CNI plugin install and aws-k8s-agent startup aws/amazon-vpc-cni-k8s#282
- ENI warming is delayed for at least for 1 minute, probably caused by #480 aws/amazon-vpc-cni-k8s#525
- Kubelet will start scheduling pods before amazon-vpc-cni-k8s daemon set is fully functional causing workloads to error with "failed to assign an IP address to container" aws/amazon-vpc-cni-k8s#330
- Replace AWS cni plugin
- [EKS] [CNI]: Optional Default CNI Plugin Installation aws/containers-roadmap#71
- Opt-Out AWS VPC CNI (And Any Other EKS "Magics") awslabs/amazon-eks-ami#117
- [EKS] Increased pod density on smaller instance types aws/containers-roadmap#138
- Ability to choose another CNI aws/amazon-vpc-cni-k8s#214
- Pods stuck in ContainerCreating due to CNI Failing to Assing IP to Container Until aws-node is deleted aws/amazon-vpc-cni-k8s#59
Alternative CNI plugins
- weave-net
- Running Weave Net on EKS
- By far the simplest installation process
- metrics-server and istio seem to not work out of the box
- Metric server with weave net can't collect monitoring data from the node where his pod is placed kubernetes-sigs/metrics-server#166
- Error from server (ServiceUnavailable): the server is currently unable to handle the request kubernetes-sigs/metrics-server#188
- couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request kubernetes-sigs/metrics-server#157
- Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP kubernetes-sigs/metrics-server#131
- Metrics server api not getting registered kubernetes-sigs/metrics-server#45
- calico
- Running Calico on EKS
- GKE uses it?
- EKS supports running Calico alongside AWS CNI
- flannel
- istio
- cni-genie (to install other CNIs)
- install it on the master node and configure the default to a different CNI before creating the ec2 worker nodes
- Example project of using cni-genie to install weave on EKS
May need to run kubectl delete --namespace kube-system daemonset/aws-node
before adding worker nodes to uninstall the AWS cni. May also need to start kubelet without --network-plugin=cni
- otherwise kubelet may refuse to start because the configured CNI plugin cannot be brought up (aws-node container is not running). Another way to remove the AWS cni is to build a custom AMI with the desired CNI plugin prefixed with 00 instead of the standard 10 so that it circumvents the loading of the AWS VPC CNI plugin. source
Things that will help:
- Increase default node size
- Increase default CPU request (to reach CPU limits before pod limits)
- Replace argo with custom DAG management