Skip to content

Broken pipe errors for large http and grpc calls #14894

@janwytze

Description

@janwytze

What is the issue?

On rather large requests (~20MB) we see many broken pipe errors, which disappear when we don't use the Linkerd sidecar. I'm able to reproduce it quite consistently when sending concurrent large requests.

I'm not sure if this is a limitation of Linkerd, a configuration issue or a real bug.

How can it be reproduced?

I've created the following k8s yaml that reproduces the issue:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ok-deployment
  labels:
    app: ok-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ok-app
  template:
    metadata:
      labels:
        app: ok-app
      annotations:
        # Removing this fixes the broken pipe errors.
        linkerd.io/inject: enabled
    spec:
      imagePullSecrets:
      - name: dochorizon-regcred
      containers:
      - name: http-echo
        image: hashicorp/http-echo:0.2.3
        args:
          - "-text=ok"
          - "-listen=:5678"
        ports:
        - containerPort: 5678
---
apiVersion: v1
kind: Service
metadata:
  name: ok-service
spec:
  selector:
    app: ok-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5678
---
apiVersion: batch/v1
kind: Job
metadata:
  name: load-test-ok-service
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: load-tester
        image: alpine:3.19
        command:
          - /bin/sh
          - -c
          - |
            set -eu

            apk add --no-cache curl coreutils

            # Create 20MB of random data, base64 encode, and wrap in JSON
            b64data=$(dd if=/dev/urandom bs=1M count=20 2>/dev/null | base64 -w0)
            printf '{"documents":[{"data":"%s"}]}\n' "$b64data" > /tmp/payload.json

            # Request sending script
            cat > /tmp/send_request.sh <<'EOF'
            #!/bin/sh
            req="$1"
            code=$(curl -o /dev/null -s -w "%{http_code}" \
                -X POST "http://ok-service/api/services/document_capturing/v1/financial?pizza=${req}" \
                -H "X-Api-Key: test" \
                -H "Content-Type: application/json" \
                --data @/tmp/payload.json)
            printf "%s %s: %s\n" "$(date "+%Y-%m-%d %H:%M:%S")" "$req" "$code"
            EOF

            chmod +x /tmp/send_request.sh

            # Run 10 requests in parallel
            seq 1 10 | xargs -n1 -P10 /tmp/send_request.sh
kubectl --context klippa-aks-de-west-test -n default apply -f test.yaml

# Delete job and reapply, to make sure the http server has started when the job is ran.
kubectl -n default delete job load-test-ok-service
kubectl --context klippa-aks-de-west-test -n default apply -f test.yaml

Logs, error output, etc

The logs of the load-tester tool:

2026-01-29 10:05:29.483	2026-01-29 09:05:29 10: 502
2026-01-29 10:05:29.477	2026-01-29 09:05:29 1: 200
2026-01-29 10:05:29.476	2026-01-29 09:05:29 2: 200
2026-01-29 10:05:29.473	2026-01-29 09:05:29 4: 200
2026-01-29 10:05:29.472	2026-01-29 09:05:29 8: 200
2026-01-29 10:05:29.455	2026-01-29 09:05:29 7: 200
2026-01-29 10:05:29.411	2026-01-29 09:05:29 5: 502
2026-01-29 10:05:29.405	2026-01-29 09:05:29 6: 200
2026-01-29 10:05:29.402	2026-01-29 09:05:29 9: 200
2026-01-29 10:05:29.401	2026-01-29 09:05:29 3: 200
2026-01-29 10:05:28.091	OK: 14 MiB in 32 packages
2026-01-29 10:05:28.046	Executing ca-certificates-20250911-r0.trigger
2026-01-29 10:05:28.040	Executing busybox-1.36.1-r20.trigger
2026-01-29 10:05:28.034	(17/17) Installing curl (8.14.1-r2)
2026-01-29 10:05:28.027	(16/17) Installing libcurl (8.14.1-r2)
2026-01-29 10:05:28.024	(15/17) Installing libpsl (0.21.5-r0)
2026-01-29 10:05:28.020	(14/17) Installing nghttp2-libs (1.58.0-r0)
2026-01-29 10:05:28.016	(13/17) Installing libidn2 (2.3.4-r4)
2026-01-29 10:05:28.001	(12/17) Installing libunistring (1.1-r2)
2026-01-29 10:05:27.997	(11/17) Installing c-ares (1.27.0-r0)
2026-01-29 10:05:27.988	(10/17) Installing brotli-libs (1.1.0-r1)
2026-01-29 10:05:27.968	(9/17) Installing ca-certificates (20250911-r0)
2026-01-29 10:05:27.950	(8/17) Installing coreutils (9.4-r2)
2026-01-29 10:05:27.948	(7/17) Installing utmps-libs (0.1.2.2-r0)
2026-01-29 10:05:27.945	(6/17) Installing skalibs (2.14.0.1-r0)
2026-01-29 10:05:27.943	(5/17) Installing libattr (2.5.1-r5)
2026-01-29 10:05:27.940	(4/17) Installing libacl (2.3.1-r4)
2026-01-29 10:05:27.938	(3/17) Installing coreutils-sha512sum (9.4-r2)
2026-01-29 10:05:27.935	(2/17) Installing coreutils-fmt (9.4-r2)
2026-01-29 10:05:27.933	(1/17) Installing coreutils-env (9.4-r2)
2026-01-29 10:05:27.675	fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/x86_64/APKINDEX.tar.gz
2026-01-29 10:05:27.581	fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz

The (debug) logs of the linkerd sidecar:
https://gist.github.com/janwytze/2727424b908e73b778503973d69a422b

output of linkerd check -o short

linkerd-version
---------------
‼ cli is up-to-date
    is running version 25.10.7 but the latest edge version is 26.1.3
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 25.12.3 but the latest edge version is 26.1.3
    see https://linkerd.io/2/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running edge-25.12.3 but cli running edge-25.10.7
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-787f865c76-xk5rr (edge-25.12.3)
	* linkerd-identity-6bfc9dd8dd-8dxs8 (edge-25.12.3)
	* linkerd-proxy-injector-856f7dcd94-92d8d (edge-25.12.3)
	* metrics-api-5c885cc459-8gtj7 (edge-25.12.3)
	* prometheus-64b67db8f6-7pv96 (edge-25.12.3)
	* tap-injector-5f5654cc4d-gprqg (edge-25.12.3)
	* web-9c7655494-g2bgw (edge-25.12.3)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-787f865c76-xk5rr running edge-25.12.3 but cli running edge-25.10.7
    see https://linkerd.io/2/checks/#l5d-cp-proxy-cli-version for hints

linkerd-viz
-----------
‼ linkerd-viz pods are injected
    could not find proxy container for tap-79db9b55fc-f95ht pod
    see https://linkerd.io/2/checks/#l5d-viz-pods-injection for hints
‼ viz extension pods are running
    container "linkerd-proxy" in pod "tap-79db9b55fc-f95ht" is not ready
    see https://linkerd.io/2/checks/#l5d-viz-pods-running for hints
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-787f865c76-xk5rr (edge-25.12.3)
	* linkerd-identity-6bfc9dd8dd-8dxs8 (edge-25.12.3)
	* linkerd-proxy-injector-856f7dcd94-92d8d (edge-25.12.3)
	* metrics-api-5c885cc459-8gtj7 (edge-25.12.3)
	* prometheus-64b67db8f6-7pv96 (edge-25.12.3)
	* tap-injector-5f5654cc4d-gprqg (edge-25.12.3)
	* web-9c7655494-g2bgw (edge-25.12.3)
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    linkerd-destination-787f865c76-xk5rr running edge-25.12.3 but cli running edge-25.10.7
    see https://linkerd.io/2/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

  • Kubernetes 1.33.6
  • Azure Kubernetes
  • Issue occurs on both test (Linkerd 2025.12.3) and prod (Linkerd edge-24.10.2)

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions