Skip to content

Make it more obvious that max_file is exceeded #7654

@alexbozhenko

Description

@alexbozhenko

Proposed change

I saw a scenario where available disk space was decreased while the server was down(let's say new logs from other services were written to disk while nats was down).
It was not immediately obvious what is going on

  1. Publish 500M of messages:
 nats bench  js  pub sync  test --replicas=3 --msgs=500 --storage=file --create --clients 100 --purge  --maxbytes="1GB" --size=1MB
  1. Stop the cluster
  2. Update the config to this:
jetstream {
  max_file: 490M
}
  1. Start the cluster
# nats server report jetstream
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                          JetStream Summary                                         │
├────────┬──────────────┬─────────┬───────────┬──────────┬───────┬────────┬──────┬─────────┬─────────┤
│ Server │ Cluster      │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ Pending │
├────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│ s1     │ proc_compose │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 5       │       0 │
│ s2     │ proc_compose │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 1       │       0 │
│ s3*    │ proc_compose │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 10      │       0 │
├────────┼──────────────┼─────────┼───────────┼──────────┼───────┼────────┼──────┼─────────┼─────────┤
│        │              │ 0       │ 0         │ 0        │ 0 B   │ 0 B    │ 0 B  │ 16      │       0 │
╰────────┴──────────────┴─────────┴───────────┴──────────┴───────┴────────┴──────┴─────────┴─────────╯

╭───────────────────────────────────────────────────────────────────────╮
│                      RAFT Meta Group Information                      │
├─────────────────┬──────────┬────────┬─────────┬────────┬────────┬─────┤
│ Connection Name │ ID       │ Leader │ Current │ Online │ Active │ Lag │
├─────────────────┼──────────┼────────┼─────────┼────────┼────────┼─────┤
│ s1              │ k2My6qdB │        │ true    │ true   │ 962ms  │ 0   │
│ s2              │ noC8kOtg │        │ true    │ true   │ 962ms  │ 0   │
│ s3              │ 3ahZoO2Q │ yes    │ true    │ true   │ 0s     │ 0   │
╰─────────────────┴──────────┴────────┴─────────┴────────┴────────┴─────╯

The check that we use in Liveliness probe:
https://github.com/nats-io/k8s/blob/0329e987476428736dc4dc9c024f086bd35a6604/helm/charts/nats/files/stateful-set/nats-container.yaml#L38-L64
Will return "ok":

# curl http://localhost:8222/healthz?js-enabled-only=true
{"status":"ok"}

Full health check does show the issue though:

# nats server report health
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                           Health Report                                                          │
├────────┬──────────────┬────────┬────────────────┬────────┬───────────────────────────────────────────────────────────────────────┤
│ Server │ Cluster      │ Domain │ Status         │ Type   │ Error                                                                 │
├────────┼──────────────┼────────┼────────────────┼────────┼───────────────────────────────────────────────────────────────────────┤
│ s1     │ proc_compose │        │ error (503)    │        │                                                                       │
│        │              │        │                │ STREAM │ JetStream stream 'ACC > benchstream' is not current: stream not found │
│ s2     │ proc_compose │        │ error (503)    │        │                                                                       │
│        │              │        │                │ STREAM │ JetStream stream 'ACC > benchstream' is not current: stream not found │
│ s3     │ proc_compose │        │ error (503)    │        │                                                                       │
│        │              │        │                │ STREAM │ JetStream stream 'ACC > benchstream' is not current: stream not found │
├────────┼──────────────┼────────┼────────────────┼────────┼───────────────────────────────────────────────────────────────────────┤
│ 3      │ 1            │        │ ok: 0 / err: 3 │        │ 3                                                                     │
╰────────┴──────────────┴────────┴────────────────┴────────┴───────────────────────────────────────────────────────────────────────╯

Should nats server report jetstream report on such condition somehow?

Do we want to update probes in k8s to account for such condition?

Use case

Make troubleshooting easier.

Contribution

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalEnhancement idea or proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions