Skip to content

TLS Handshake timeout pulling images starting in v1.31.8+k3s1 (works in v1.31.7) #12940

@bmorris53

Description

@bmorris53

Bug Report: TLS Handshake Timeout on Image Pulls in K3s v1.31.8+

Summary

When upgrading from K3s v1.31.7+k3s1 to K3s v1.31.8+k3s1 (and all later versions in the 1.31.x and 1.32.x lines), all container image pulls fail with TLS handshake timeout errors. This issue occurs consistently across multiple registries (docker.io, quay.io, etc.) and prevents workloads from being deployed.

The problem does not occur on v1.31.7+k3s1 and earlier.

Environment

  • OS: Rocky Linux 9.6 (Blue Onyx)
  • Kernel: 5.14.0-570.39.1.el9_6.x86_64
  • Arch: x86_64
  • Cluster: Multi-master K3s cluster, on Openstack VMs
  • Proxy: None (no proxy configured in environment)
  • Firewall: Outbound 443 is open and verified with openssl and curl
  • CNI: Cilium v1.7.6 consistent across k3s versions

k3s config.yaml for server nodes

cluster-init: true
token: "<redacted>"
node-ip: 172.40.0.119,<ipv6-redacted>
cluster-cidr: 10.42.0.0/16,fd01::/48
service-cidr: 10.43.0.0/16,fd02::/112
flannel-backend: none
disable-network-policy: true
disable-kube-proxy: true
selinux: true
secrets-encryption: true
write-kubeconfig-mode: "0644"
tls-san: 172.40.0.250
disable: metrics-server

Versions Tested

K3s Version Go Version containerd Version Status
v1.31.7+k3s1 go1.23.6 2.0.4 ✅ Works
v1.31.8+k3s1 go1.23.6 2.0.4 ❌ Fails
v1.31.12+k3s1 go1.23.11 2.0.5 ❌ Fails
v1.32.4+k3s1 go1.23.6 2.0.5 ❌ Fails
v1.32.7+k3s1 go1.23.10 2.0.5 ❌ Fails

Symptoms

Pods enter ImagePullBackOff with messages like:

Failed to pull image "rancher/mirrored-coredns-coredns:1.12.1": failed to resolve reference: failed to do request: Head "https://registry-1.docker.io/v2/...": net/http: TLS handshake timeout

Manually testing from host works:

curl -v https://registry-1.docker.io/v2/
openssl s_client -connect registry-1.docker.io:443

Both succeed.

But using containerd directly fails:

sudo /usr/local/bin/k3s ctr images pull docker.io/library/busybox:1.36
# error: TLS handshake timeout

Enabling containerd debug logging shows:

DEBU[0000] resolving
DEBU[0000] do request
DEBU[0000] fetch response received
DEBU[0000] ... timeout while fetching

Notes

  • Issue starts between v1.31.7 (works) and v1.31.8 (fails).
  • Both use Go 1.23.6 and containerd 2.0.4, suggesting a change in vendored dependencies or TLS handling in that release.
  • Not specific to docker.io; also affects quay.io and other registries.
  • Not specific to registries.yaml or mirrors — even direct public pulls fail.

Steps to Reproduce

  1. Install K3s v1.31.8+k3s1 or later on Rocky Linux 9.6 (x86_64).
  2. Attempt to pull an image:
    sudo /usr/local/bin/k3s ctr images pull docker.io/library/busybox:1.36
  3. Observe TLS handshake timeout.

Test Script used:

# versions to try, newest→oldest between .12 and .7
#VERS="v1.31.12+k3s1 v1.31.11+k3s1 v1.31.10+k3s1 v1.31.9+k3s1 v1.31.8+k3s1"
VERS="v1.32.4+k3s1 v1.32.7+k3s1"

for V in $VERS; do
  echo "===== TEST $V =====" | tee -a bisect.log
  curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=$V sh -s - || { echo "install $V failed"; break; }
  sleep 6
  sudo /usr/local/bin/k3s ctr version | tee -a bisect.log

  # tiny images from multiple registries
  for IMG in docker.io/library/busybox:1.36 quay.io/prometheus/node-exporter:v1.8.1; do
    echo "Pull $IMG" | tee -a bisect.log
    # capture very chatty trace per attempt
    sudo /usr/local/bin/k3s ctr --debug images pull --http-trace --local "$IMG" \
      &> pull-$V-$(echo "$IMG"|tr '/:' '__').log
    echo "exit=$?" | tee -a bisect.log
  done
done

Logs from tests (exit=1 == FAIL | exit=0 == PASS):

===== TEST v1.31.12+k3s1 =====
Client:
  Version:  v2.0.5-k3s2.32
  Revision:
  Go version: go1.23.11

Server:
  Version:  v2.0.5-k3s2.32
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.11+k3s1 =====
Client:
  Version:  v2.0.5-k3s2.32
  Revision:
  Go version: go1.23.10

Server:
  Version:  v2.0.5-k3s2.32
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.10+k3s1 =====
Client:
  Version:  v2.0.5-k3s1.32
  Revision:
  Go version: go1.23.10

Server:
  Version:  v2.0.5-k3s1.32
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.9+k3s1 =====
Client:
  Version:  v2.0.5-k3s1.32
  Revision:
  Go version: go1.23.8

Server:
  Version:  v2.0.5-k3s1.32
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.8+k3s1 =====
Client:
  Version:  v2.0.4-k3s2
  Revision:
  Go version: go1.23.6

Server:
  Version:  v2.0.4-k3s2
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.31.7+k3s1 =====
Client:
  Version:  v2.0.4-k3s2
  Revision:
  Go version: go1.23.6

Server:
  Version:  v2.0.4-k3s2
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=0
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=0
===== TEST v1.32.4+k3s1 =====
Client:
  Version:  v2.0.4-k3s2
  Revision:
  Go version: go1.23.6

Server:
  Version:  v2.0.4-k3s2
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1
===== TEST v1.32.7+k3s1 =====
Client:
  Version:  v2.0.5-k3s2.32
  Revision:
  Go version: go1.23.10

Server:
  Version:  v2.0.5-k3s2.32
  Revision:
  UUID: 6b8b8b92-e6ee-477b-abca-729e283e5ddd
Pull docker.io/library/busybox:1.36
exit=1
Pull quay.io/prometheus/node-exporter:v1.8.1
exit=1

Downgrade to v1.31.7+k3s1 and repeat — the pull succeeds.

Expected Behavior

K3s should pull images successfully as in v1.31.7 and earlier.

Actual Behavior

All image pulls fail with TLS handshake timeouts on v1.31.8 and later.

Impact

  • Prevents workloads from starting (all image pulls fail).
  • Impacts both system components (CoreDNS, Traefik, etc.) and user workloads.
  • Blocks upgrades beyond v1.31.7.

Request

Please investigate changes introduced in v1.31.8+k3s1 that affect containerd's TLS handling.
This appears unrelated to containerd itself (still 2.0.4 at that point) and may be tied to Go TLS libraries or other vendored changes.


Contact

Environment available for reproducing/testing if needed. Logs, debug traces, and configs can be provided on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions