-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
proposalEnhancement idea or proposalEnhancement idea or proposal
Description
Proposed change
I saw a scenario where available disk space was decreased while the server was down(let's say new logs from other services were written to disk while nats was down).
It was not immediately obvious what is going on
- Publish 500M of messages:
nats bench js pub sync test --replicas=3 --msgs=500 --storage=file --create --clients 100 --purge --maxbytes="1GB" --size=1MB
- Stop the cluster
- Update the config to this:
jetstream {
max_file: 490M
}
- Start the cluster
# nats server report jetstream
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ JetStream Summary │
├────────┬──────────────┬─────────┬───────────┬──────────┬───────┬────────┬──────┬─────────┬─────────┤
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ Pending │
├────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│ s1 │ proc_compose │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 5 │ 0 │
│ s2 │ proc_compose │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 1 │ 0 │
│ s3* │ proc_compose │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 10 │ 0 │
├────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│ │ │ 0 │ 0 │ 0 │ 0 B │ 0 B │ 0 B │ 16 │ 0 │
╰────────┴──────────────┴─────────┴───────────┴──────────┴───────┴────────┴──────┴─────────┴─────────╯
╭───────────────────────────────────────────────────────────────────────╮
│ RAFT Meta Group Information │
├─────────────────┬──────────┬────────┬─────────┬────────┬────────┬─────┤
│ Connection Name │ ID │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────┼──────────┼────────┼─────────┼────────┼────────┼─────┤
│ s1 │ k2My6qdB │ │ true │ true │ 962ms │ 0 │
│ s2 │ noC8kOtg │ │ true │ true │ 962ms │ 0 │
│ s3 │ 3ahZoO2Q │ yes │ true │ true │ 0s │ 0 │
╰─────────────────┴──────────┴────────┴─────────┴────────┴────────┴─────╯
The check that we use in Liveliness probe:
https://github.com/nats-io/k8s/blob/0329e987476428736dc4dc9c024f086bd35a6604/helm/charts/nats/files/stateful-set/nats-container.yaml#L38-L64
Will return "ok":
# curl http://localhost:8222/healthz?js-enabled-only=true
{"status":"ok"}
Full health check does show the issue though:
# nats server report health
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Health Report │
├────────┬──────────────┬────────┬────────────────┬────────┬───────────────────────────────────────────────────────────────────────┤
│ Server │ Cluster │ Domain │ Status │ Type │ Error │
├────────┼──────────────┼────────┼────────────────┼────────┼───────────────────────────────────────────────────────────────────────┤
│ s1 │ proc_compose │ │ error (503) │ │ │
│ │ │ │ │ STREAM │ JetStream stream 'ACC > benchstream' is not current: stream not found │
│ s2 │ proc_compose │ │ error (503) │ │ │
│ │ │ │ │ STREAM │ JetStream stream 'ACC > benchstream' is not current: stream not found │
│ s3 │ proc_compose │ │ error (503) │ │ │
│ │ │ │ │ STREAM │ JetStream stream 'ACC > benchstream' is not current: stream not found │
├────────┼──────────────┼────────┼────────────────┼────────┼───────────────────────────────────────────────────────────────────────┤
│ 3 │ 1 │ │ ok: 0 / err: 3 │ │ 3 │
╰────────┴──────────────┴────────┴────────────────┴────────┴───────────────────────────────────────────────────────────────────────╯
Should nats server report jetstream report on such condition somehow?
Do we want to update probes in k8s to account for such condition?
Use case
Make troubleshooting easier.
Contribution
No response
charles-dyfis-net
Metadata
Metadata
Assignees
Labels
proposalEnhancement idea or proposalEnhancement idea or proposal