(2.12) Atomic batch: support R1 stream #7152

MauriceVanVeen · 2025-08-06T15:28:22Z

R1 streams currently aren't backed by Raft logs, which means custom handling and recovery is required for R1. The layout on disk looks like:

/nats/jetstream/$G/streams/test-stream
├── batches
│   └── teeTx1wn
│       ├── meta.inf
│       ├── meta.sum
│       ├── msgs
│       │   └── 1.blk
│       └── obs
├── meta.inf
├── meta.sum
├── msgs
│   ├── 1.blk
│   └── index.db
└── obs

Inflight batches for a stream are stored under a batches directory, the data itself is just another file-based stream. The stream name is a hashed Nats-Batch-Id to not store arbitrary strings, and making them fixed-length. These streams support encryption at rest and use AsyncFlush by default. This allows batches to be written to really quickly, only requiring a flush before committing the batch.

To not duplicate much of the batching-specific logic, this has been extracted into a reusable mset.processJetStreamBatchMsg that handles both clustered and standalone streams. Batch commits are proposed through Raft for replicated/clustered streams, and R1 streams simply loop over all messages and call mset.processJetStreamMsg directly after all consistency checks.

That leaves recovery of a partially-written batch after a hard kill. This PR resolves that by looking if batching is enabled, and whether the last message in the stream is a batch message that is not the commit itself. We then lookup the batch state on-disk (we haven't removed that yet), and then process and store remaining messages from the batch. Afterward all unused batches are cleaned up.

Because batch recovery, after a hard kill, requires the existence of a last message that needs to represent the last write in that batch. Certain stream settings can be problematic when they remove this last message:

Interest retention with no consumers doesn't store a message, a hard kill during this would result in the remaining batch not being recoverable and the rest of the batch to not be stored (if there are no consumers at all, this is no issue, but with distinct consumers some messages could be missed).
A short MaxAge or TTL on the last message could result in that message being removed, a hard kill would then also be unrecoverable, remaining messages in the batch would not be stored.

There might be some more problematic settings, but these are the most obvious ones. This will probably be fine in practice, but could be fixed in the future by reusing the Raft replication logic. Replicated streams don't have any issues during hard kills, and generally replicated streams would be recommended for higher consistency anyway.

Resolves #6974

Signed-off-by: Maurice van Veen [email protected]

neilalexander

Mostly just minor things.

server/filestore.go

server/jetstream_batching.go

server/raft.go

server/store.go

server/stream.go

Signed-off-by: Maurice van Veen <[email protected]>

neilalexander

LGTM

MauriceVanVeen force-pushed the maurice/batch-r1 branch 2 times, most recently from e6c2ab7 to 6ed7801 Compare August 6, 2025 15:55

MauriceVanVeen marked this pull request as ready for review August 6, 2025 20:54

MauriceVanVeen requested a review from a team as a code owner August 6, 2025 20:54

neilalexander reviewed Aug 7, 2025

View reviewed changes

MauriceVanVeen added 2 commits August 8, 2025 09:33

(2.12) Atomic batch: support R1 stream

940852c

Signed-off-by: Maurice van Veen <[email protected]>

(2.12) Don't delete filestore batch in goroutine

45da80e

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen force-pushed the maurice/batch-r1 branch from 6ed7801 to 45da80e Compare August 8, 2025 07:33

MauriceVanVeen requested a review from neilalexander August 8, 2025 11:09

neilalexander approved these changes Aug 8, 2025

View reviewed changes

neilalexander merged commit 0d02332 into main Aug 8, 2025
110 of 114 checks passed

neilalexander deleted the maurice/batch-r1 branch August 8, 2025 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

(2.12) Atomic batch: support R1 stream #7152

(2.12) Atomic batch: support R1 stream #7152

MauriceVanVeen commented Aug 6, 2025

Uh oh!

neilalexander left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neilalexander left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

(2.12) Atomic batch: support R1 stream #7152

(2.12) Atomic batch: support R1 stream #7152

Conversation

MauriceVanVeen commented Aug 6, 2025

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!