-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Bug Report: TLS Handshake Timeout on Image Pulls in K3s v1.31.8+
Summary
When upgrading from K3s v1.31.7+k3s1 to K3s v1.31.8+k3s1 (and all later versions in the 1.31.x and 1.32.x lines), all container image pulls fail with TLS handshake timeout errors. This issue occurs consistently across multiple registries (docker.io, quay.io, etc.) and prevents workloads from being deployed.
The problem does not occur on v1.31.7+k3s1 and earlier.
Environment
- OS: Rocky Linux 9.6 (Blue Onyx)
- Kernel: 5.14.0-570.39.1.el9_6.x86_64
- Arch: x86_64
- Cluster: Multi-master K3s cluster, on Openstack VMs
- Proxy: None (no proxy configured in environment)
- Firewall: Outbound 443 is open and verified with
openssl
andcurl
- CNI: Cilium v1.7.6 consistent across k3s versions
k3s config.yaml for server nodes
cluster-init: true
token: "<redacted>"
node-ip: 172.40.0.119,<ipv6-redacted>
cluster-cidr: 10.42.0.0/16,fd01::/48
service-cidr: 10.43.0.0/16,fd02::/112
flannel-backend: none
disable-network-policy: true
disable-kube-proxy: true
selinux: true
secrets-encryption: true
write-kubeconfig-mode: "0644"
tls-san: 172.40.0.250
disable: metrics-server
Versions Tested
K3s Version | Go Version | containerd Version | Status |
---|---|---|---|
v1.31.7+k3s1 | go1.23.6 | 2.0.4 | ✅ Works |
v1.31.8+k3s1 | go1.23.6 | 2.0.4 | ❌ Fails |
v1.31.12+k3s1 | go1.23.11 | 2.0.5 | ❌ Fails |
v1.32.4+k3s1 | go1.23.6 | 2.0.5 | ❌ Fails |
v1.32.7+k3s1 | go1.23.10 | 2.0.5 | ❌ Fails |
Symptoms
Pods enter ImagePullBackOff with messages like:
Failed to pull image "rancher/mirrored-coredns-coredns:1.12.1": failed to resolve reference: failed to do request: Head "https://registry-1.docker.io/v2/...": net/http: TLS handshake timeout
Manually testing from host works:
curl -v https://registry-1.docker.io/v2/
openssl s_client -connect registry-1.docker.io:443
Both succeed.
But using containerd directly fails:
sudo /usr/local/bin/k3s ctr images pull docker.io/library/busybox:1.36
# error: TLS handshake timeout
Enabling containerd debug logging shows:
DEBU[0000] resolving
DEBU[0000] do request
DEBU[0000] fetch response received
DEBU[0000] ... timeout while fetching
Notes
- Issue starts between v1.31.7 (works) and v1.31.8 (fails).
- Both use Go 1.23.6 and containerd 2.0.4, suggesting a change in vendored dependencies or TLS handling in that release.
- Not specific to docker.io; also affects quay.io and other registries.
- Not specific to registries.yaml or mirrors — even direct public pulls fail.
Steps to Reproduce
- Install K3s v1.31.8+k3s1 or later on Rocky Linux 9.6 (x86_64).
- Attempt to pull an image:
sudo /usr/local/bin/k3s ctr images pull docker.io/library/busybox:1.36
- Observe
TLS handshake timeout
.
Test Script used:
# versions to try, newest→oldest between .12 and .7
#VERS="v1.31.12+k3s1 v1.31.11+k3s1 v1.31.10+k3s1 v1.31.9+k3s1 v1.31.8+k3s1"
VERS="v1.32.4+k3s1 v1.32.7+k3s1"
for V in $VERS; do
echo "===== TEST $V =====" | tee -a bisect.log
curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=$V sh -s - || { echo "install $V failed"; break; }
sleep 6
sudo /usr/local/bin/k3s ctr version | tee -a bisect.log
# tiny images from multiple registries
for IMG in docker.io/library/busybox:1.36 quay.io/prometheus/node-exporter:v1.8.1; do
echo "Pull $IMG" | tee -a bisect.log
# capture very chatty trace per attempt
sudo /usr/local/bin/k3s ctr --debug images pull --http-trace --local "$IMG" \
&> pull-$V-$(echo "$IMG"|tr '/:' '__').log
echo "exit=$?" | tee -a bisect.log
done
done
Logs from tests (exit=1 == FAIL | exit=0 == PASS):
===== TEST v1.31.12+k3s1 =====
Client:
Version: v2.0.5-k3s2.32
Revision:
Go version: go1.23.11
Server:
Version: v2.0.5-k3s2.32
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.11+k3s1 =====
Client:
Version: v2.0.5-k3s2.32
Revision:
Go version: go1.23.10
Server:
Version: v2.0.5-k3s2.32
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.10+k3s1 =====
Client:
Version: v2.0.5-k3s1.32
Revision:
Go version: go1.23.10
Server:
Version: v2.0.5-k3s1.32
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.9+k3s1 =====
Client:
Version: v2.0.5-k3s1.32
Revision:
Go version: go1.23.8
Server:
Version: v2.0.5-k3s1.32
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.8+k3s1 =====
Client:
Version: v2.0.4-k3s2
Revision:
Go version: go1.23.6
Server:
Version: v2.0.4-k3s2
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.7+k3s1 =====
Client:
Version: v2.0.4-k3s2
Revision:
Go version: go1.23.6
Server:
Version: v2.0.4-k3s2
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=0
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=0
===== TEST v1.32.4+k3s1 =====
Client:
Version: v2.0.4-k3s2
Revision:
Go version: go1.23.6
Server:
Version: v2.0.4-k3s2
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.32.7+k3s1 =====
Client:
Version: v2.0.5-k3s2.32
Revision:
Go version: go1.23.10
Server:
Version: v2.0.5-k3s2.32
Revision:
UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
Downgrade to v1.31.7+k3s1 and repeat — the pull succeeds.
Expected Behavior
K3s should pull images successfully as in v1.31.7 and earlier.
Actual Behavior
All image pulls fail with TLS handshake timeouts on v1.31.8 and later.
Impact
- Prevents workloads from starting (all image pulls fail).
- Impacts both system components (CoreDNS, Traefik, etc.) and user workloads.
- Blocks upgrades beyond v1.31.7.
Request
Please investigate changes introduced in v1.31.8+k3s1 that affect containerd's TLS handling.
This appears unrelated to containerd itself (still 2.0.4 at that point) and may be tied to Go TLS libraries or other vendored changes.
Contact
Environment available for reproducing/testing if needed. Logs, debug traces, and configs can be provided on request.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status