-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Observed behavior
Very similar, if not identical to #4480, however we've increased resources and whilst it has improved the situation, it hasn't eliminated it, and based on the CPU/memory utilisation now, I've had expected this to go away.
We were running on GCP/GKE on a dedicated node-pool of e2-standard-4
(4 CPU/16GB RAM), with 3000m
requests. We upgraded today to a new node-pool, c3-highcpu-8
(8 CPU/16GB RAM), with 6000m
requests. This increase has reduced the number of both stream and consumer elections, but it's still prevalent.
This is a 5-node NATS cluster across europe-west4, x2 in zone a, x2 in zone b and x1 in zone c. We use placement tags to split stream replicas across zone, all streams and consumers are configured with 3 replicas.
Current CPU utilisation over the last hour looks like this, there are leadership elections (around 10 of them) approximately every 5 minutes. We were at ~40-60% utilisation in the old node pool, since changing to c3
machine types we're down to 6-16% utilisation.

Memory utilisation is 5GB/5GB/8.5GB/4.6GB/4.7GB for nats-0/nats-1/.../etc, respectively. Out of 11GB requested in k8s.
nats-3 nats [18] 2025/07/24 22:00:01.164386 [INF] JetStream cluster new consumer leader for 'ABREVPTWGF566CK3B2ZG2XPQX434RBLUYYZSIN4YSZVFCSLTWGXPOCYN > KV_EXECUTION_01JPSNV00X6TTZVVEFJRWDCEW2 > oc_SFAW78BWADE62NNUYZZ5AX_1'
nats-3 nats [18] 2025/07/24 22:00:01.220019 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DEY7_1'
nats-3 nats [18] 2025/07/24 22:00:01.307951 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DFEV_1'
nats-3 nats [18] 2025/07/24 22:00:01.358362 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DFPZ_1'
nats-3 nats [18] 2025/07/24 22:00:01.382655 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PHAY_1'
nats-3 nats [18] 2025/07/24 22:00:01.430928 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DG3V_1'
nats-3 nats [18] 2025/07/24 22:00:01.446886 [INF] JetStream cluster new consumer leader for 'AB6XZ6KY3JXEARIVZQOBMRV47H77Q2HW2N3PY3SLCMPNKQZY44LSBPXF > KV_EXECUTION_01JY41M1Y272B7CGKXA07SRRMT > oc_5AAWQZD8YJQ5OACSALRDSU_1'
nats-3 nats [18] 2025/07/24 22:00:01.467883 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PHIM_1'
nats-3 nats [18] 2025/07/24 22:00:01.469948 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DGEZ_1'
nats-3 nats [18] 2025/07/24 22:00:01.472470 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DGNB_1'
nats-3 nats [18] 2025/07/24 22:00:02.013524 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PHNQ_1'
nats-3 nats [18] 2025/07/24 22:00:02.015156 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PHRK_1'
nats-3 nats [18] 2025/07/24 22:00:02.132658 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PHXY_1'
nats-3 nats [18] 2025/07/24 22:00:02.440336 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PI32_1'
nats-3 nats [18] 2025/07/24 22:05:00.770907 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PIAQ_1'
nats-3 nats [18] 2025/07/24 22:05:00.942211 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PIIE_1'
nats-3 nats [18] 2025/07/24 22:05:01.029642 [INF] JetStream cluster new consumer leader for 'ABREVPTWGF566CK3B2ZG2XPQX434RBLUYYZSIN4YSZVFCSLTWGXPOCYN > KV_EXECUTION_01JPSNV00X6TTZVVEFJRWDCEW2 > oc_SFAW78BWADE62NNUYZZ6BL_1'
nats-3 nats [18] 2025/07/24 22:05:01.032084 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DH9J_1'
nats-3 nats [18] 2025/07/24 22:05:01.830663 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DHNF_1'
nats-3 nats [18] 2025/07/24 22:10:01.041559 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PIQ2_1'
nats-3 nats [18] 2025/07/24 22:10:01.103740 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PIXQ_1'
nats-3 nats [18] 2025/07/24 22:10:01.189385 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DI43_1'
nats-3 nats [18] 2025/07/24 22:10:01.229764 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JVA004CDDY4MMBMCKW7FBZVN > oc_MPIV84KXKV7AVQU1W0PJ2U_1'
nats-3 nats [18] 2025/07/24 22:10:01.617779 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DIF7_1'
nats-3 nats [18] 2025/07/24 22:10:01.877911 [INF] JetStream cluster new consumer leader for 'ABREVPTWGF566CK3B2ZG2XPQX434RBLUYYZSIN4YSZVFCSLTWGXPOCYN > KV_EXECUTION_01JPSNV00X6TTZVVEFJRWDCEW2 > oc_SFAW78BWADE62NNUYZZ7C9_1'
nats-3 nats [18] 2025/07/24 22:10:03.000382 [INF] JetStream cluster new consumer leader for 'ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ > KV_EXECUTION_01JWX99REGC3VW9JNWK17NC0ZY > oc_QRFUX2Y5SAZ4W0F4W1DIVV_1'
These appear to all be consumer elections, but we were getting stream elections earlier today (assuming it's something to do with business hours and less traffic) I can report back tomorrow to see if the stream elections kick back up - we increased resources around 12:00 today, and the number of elections in total compared to yesterday is lower:
Yesterday:
- 30-1000 stream elections per hour
- 600-2500 consumer elections per hour
Today:
- 0-10 stream elections per hour
- 350-1000 consumer elections per hour
nats server report jetstream
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ JetStream Summary │
├─────────┬─────────┬─────────┬───────────┬────────────┬─────────┬────────┬─────────┬─────────┬──────────────┤
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err │
├─────────┼─────────┼─────────┼───────────┼────────────┼─────────┼────────┼─────────┼─────────┼──────────────┤
│ nats-0 │ nats │ 1,382 │ 3,954 │ 5,029,316 │ 13 GiB │ 0 B │ 13 GiB │ 649 │ 8 / 1.232% │
│ nats-1 │ nats │ 1,331 │ 4,162 │ 4,356,485 │ 9.7 GiB │ 0 B │ 9.7 GiB │ 9,883 │ 3 / 0.030% │
│ nats-2 │ nats │ 2,460 │ 7,685 │ 8,245,595 │ 20 GiB │ 0 B │ 20 GiB │ 31,356 │ 36 / 0.114% │
│ nats-3* │ nats │ 1,435 │ 3,569 │ 4,024,801 │ 10 GiB │ 0 B │ 10 GiB │ 17,820 │ 651 / 3.653% │
│ nats-4 │ nats │ 1,305 │ 3,781 │ 3,400,799 │ 7.0 GiB │ 0 B │ 7.0 GiB │ 26,684 │ 3 / 0.011% │
├─────────┼─────────┼─────────┼───────────┼────────────┼─────────┼────────┼─────────┼─────────┼──────────────┤
│ │ │ 7,913 │ 23,151 │ 25,056,996 │ 60 GIB │ 0 B │ 60 GIB │ 86,392 │ 701 │
╰─────────┴─────────┴─────────┴───────────┴────────────┴─────────┴────────┴─────────┴─────────┴──────────────╯
╭───────────────────────────────────────────────────────────────────────╮
│ RAFT Meta Group Information │
├─────────────────┬──────────┬────────┬─────────┬────────┬────────┬─────┤
│ Connection Name │ ID │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────┼──────────┼────────┼─────────┼────────┼────────┼─────┤
│ nats-0 │ S1Nunr6R │ │ true │ true │ 834ms │ 0 │
│ nats-1 │ yrzKKRBu │ │ true │ true │ 834ms │ 0 │
│ nats-2 │ cnrtt3eg │ │ true │ true │ 834ms │ 0 │
│ nats-3 │ bkCGheKT │ yes │ true │ true │ 0s │ 0 │
│ nats-4 │ HuYMtjaW │ │ true │ true │ 834ms │ 0 │
╰─────────────────┴──────────┴────────┴─────────┴────────┴────────┴─────╯
We have 392 connections, it's a bit big for pasting here, I'll attach as a file:
nats server report accounts
Our top top 10 accounts by number of subscriptions, we have 83 accounts in total:
│ ADDUDOTPOQKVURMOQFCD4MGLVNDDCVMZJRB3JV4XMXD67ZRVVZQ33OIY │ 3 │ 303,511 │ 1,520,156 │ 18 MiB │ 116 MiB │ 259 │
│ ABOTF5DVZHCSB6V4XDI4ABPN7RJOVKKTD2BOEECL7GUVDBPPCRB5ZC5K │ 9 │ 632,671 │ 1,635,513 │ 436 MiB │ 1.2 GiB │ 277 │
│ ADX5PADBLV73FO2Y3QEIZ6XZYSQ5KRINSJMHQOLF7QNKEC3DMQFFNFZJ │ 5 │ 405,909 │ 2,005,132 │ 25 MiB │ 161 MiB │ 343 │
│ ADKYAIWANVX57LKQMVE7MPNRCHYOHQGFD23VHP5PM6YKC5HZFMESMOXM │ 12 │ 464,828 │ 1,854,634 │ 98 MiB │ 394 MiB │ 406 │
│ AAYX64WAU5F7FA3OXSKX7ZG7DGXPPUQYF5ZFG6GLLTGSGFYEVKCEX4OS │ 3 │ 559,018 │ 2,148,689 │ 868 MiB │ 2.6 GiB │ 419 │
│ ABDT3RI45476FN7FRLUKB3RNHHTEBX5CBLOTKL4UR6JF3EM52GROUWLJ │ 17 │ 730,119 │ 2,717,152 │ 234 MiB │ 1.7 GiB │ 474 │
│ ACODSUSXBPXY7U7MGK6RFWDFFIWQMZXDOVK7SA7CXDWTXDBKRPFTB35X │ 62 │ 586,845 │ 2,718,686 │ 36 MiB │ 218 MiB │ 657 │
│ AB6XZ6KY3JXEARIVZQOBMRV47H77Q2HW2N3PY3SLCMPNKQZY44LSBPXF │ 18 │ 1,706,577 │ 4,975,720 │ 619 MiB │ 2.5 GiB │ 863 │
│ AAQK67EWHYITNEZC64XXATLHR2THCBI47Y2SKG6XZVBEHMWDIHJKGEPQ │ 24 │ 1,520,228 │ 5,570,094 │ 614 MiB │ 2.3 GiB │ 1,136 │
│ ADB5DVCJFHHWTB7XVXKMMMKKHI6HRVZTSYRYEYQV3ZP4ZKQHNDAEBMZP │ 22 │ 2,621,552 │ 13,177,403 │ 212 MiB │ 1.2 GiB │ 3,024 │
Looking into GCE monitoring:
- Sent/received packets between 7k-20k/second
- Received bytes 1.1-3.2 MiB/s
- Sent bytes 0.9-2.8 MiB/s
So I don't believe there's any issue with networking latency.
Our pods run the config-reloader and metrics-exporter containers alongside nats-server, with neglible utilisation. We have Istio in the cluster but NATS is outside the mesh, so I don't believe it can be anything there.
Any guidance on next steps to debugging?
Expected behavior
Not have stream/consumer elections unless a server goes down.
Server and client version
nats:2.10.26-alpine: 2.10.26
github.com/nats-io/nats.go: v1.43.0
I have searched the releases for any mention of "elections" and there isn't really anything except something to do with leaf nodes which we don't use. We haven't got round to upgrading yet.
Host environment
GKE: v1.31.8-gke.1113000
GCP machine type: c3-highcpu-8
Boot disk: Standard Persistent Disk
Data disk: SSD Persistent Disk 512Gi per node
Steps to reproduce
No response