Skip to content

Fix loadbalancing when using Traefik's ingress #185

@MichaelThamm

Description

@MichaelThamm

Bug Description

Note

We should investigate if this happens when there is continuous server-client communication, e.g. telemetrygen to otelcol. Sending manual curl requests might not be fast enough to trigger load balancing with Traefik.

When we update the Traefik static yaml (/etc/traefik/traefik.yaml) with:

serversTransport:
  maxIdleConnsPerHost: -1

and pebble restart traefik, then the requests are correctly load balanced across the services.

Without it, Traefik is sticky and only routes to one of the two units behind the k8s service address.

Docs:

Tip

To fix this, we have options:

  1. figure out how to set the Traefik global config from the otelcol charm
  2. set the dynamic config and set the Transport per router (I tested this and it works from the charm)
  3. Ask Traefik to enable this in their TraefikRoute class

Incorrect k8s service address usage

Technically in our config we should be using:

  • otelcol.otel.svc.cluster.local
    instead of the headless (otelcol-endpoints) because it has no ClusterIP when checking kubectl get services -n otel:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
otelcol ClusterIP 10.152.183.240 3500/TCP, 4317/TCP, 4318/TCP, 8888/TCP, 9411/TCP, 13133/TCP, 14250/TCP, 14268/TCP
otelcol-endpoints ClusterIP None

We get the headless in charm code with socket.getfqdn().split(".", 1)[-1]

To Reproduce

Where 192.168.88.17 is the Traefik ingress IP.

curl -i -X POST "http://TRAEFIK_IP:4318/v1/logs" \
  -H "Content-Type: application/json" \
  -d '{
    "resourceLogs": [
        {
            "resource": {
                "attributes": [
                    {
                        "key": "service.name",
                        "value": { "stringValue": "test-service" }
                    }
                ]
            },
            "scopeLogs": [
                {
                    "logRecords": [
                        {
                            "timeUnixNano": "'$(date +%s%N)'",
                            "body": { "stringValue": "+++Testing OTLP ingress+++" },
                            "severityText": "INFO"
                        }
                    ]
                }
            ]
        }
    ]
}'

In 2 separate terminals:

jssh --container otelcol otelcol/0 "pebble logs -f" | grep +++
jssh --container otelcol otelcol/1 "pebble logs -f" | grep +++

Check for the OTLP logs to arrive

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions