-
-
Notifications
You must be signed in to change notification settings - Fork 61
Snuba doesn't drop connection when Kafka node dies while still creating the healthcheck file #7763
Copy link
Copy link
Open
Description
Environment
What version are you running? 25.9.0
Steps to Reproduce
- Have a Kafka Cluster
- Use SSL
- Rip out one node to test fail-over (either network or power so it's completely unreachable)
- Snuba doesn't drop the connection to the dead broker
- Snuba is stuck in a SSL handshake error loop until restarted
The following ENVs are set:
DEFAULT_BROKERS: kafka-broker-1,kafka-broker-2,...,kafka-broker-9KAFKA_SECURITY_PROTOCOL: SSLKAFKA_SSL_CA_PATH: /etc/ssl/certs/my-ca.pemKAFKA_SSL_CERT_PATH: client.crtKAFKA_SSL_KEY_PATH: client.key
Expected Result
Snuba drops the broken connection and connects to another working broker.
Healthcheck file not being created since the consumer is in a non-working state:
--health-check-file /tmp/health.txt
Actual Result
Snuba keeps trying to do a SSL handshake to the dead broker.
%4|1771925946.628|FAIL|rdkafka#producer-1| [thrd:ssl://kafka-broker-1:9093/bootstra]: ssl://kafka-broker-1:9093/1: Connection setup timed out in state CONNECT (after 30027ms in state CONNECT, 1 identical error(s) suppressed
Health-check file is still being created therefore the container cannot be restarted automatically.
Additional information
Some Snuba consumers do drop the connection (I see like 2-5 errors in the log) and connect to a working one while others don't. I haven't found out why it sometimes works and sometimes doesn't.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Fields
Give feedbackNo fields configured for issues without a type.
Projects
Status
Waiting for: Product Owner