Skip to content

Conversation

MauriceVanVeen
Copy link
Member

R1 streams currently aren't backed by Raft logs, which means custom handling and recovery is required for R1. The layout on disk looks like:

/nats/jetstream/$G/streams/test-stream
├── batches
│   └── teeTx1wn
│       ├── meta.inf
│       ├── meta.sum
│       ├── msgs
│       │   └── 1.blk
│       └── obs
├── meta.inf
├── meta.sum
├── msgs
│   ├── 1.blk
│   └── index.db
└── obs

Inflight batches for a stream are stored under a batches directory, the data itself is just another file-based stream. The stream name is a hashed Nats-Batch-Id to not store arbitrary strings, and making them fixed-length. These streams support encryption at rest and use AsyncFlush by default. This allows batches to be written to really quickly, only requiring a flush before committing the batch.

To not duplicate much of the batching-specific logic, this has been extracted into a reusable mset.processJetStreamBatchMsg that handles both clustered and standalone streams. Batch commits are proposed through Raft for replicated/clustered streams, and R1 streams simply loop over all messages and call mset.processJetStreamMsg directly after all consistency checks.

That leaves recovery of a partially-written batch after a hard kill. This PR resolves that by looking if batching is enabled, and whether the last message in the stream is a batch message that is not the commit itself. We then lookup the batch state on-disk (we haven't removed that yet), and then process and store remaining messages from the batch. Afterward all unused batches are cleaned up.

Because batch recovery, after a hard kill, requires the existence of a last message that needs to represent the last write in that batch. Certain stream settings can be problematic when they remove this last message:

  • Interest retention with no consumers doesn't store a message, a hard kill during this would result in the remaining batch not being recoverable and the rest of the batch to not be stored (if there are no consumers at all, this is no issue, but with distinct consumers some messages could be missed).
  • A short MaxAge or TTL on the last message could result in that message being removed, a hard kill would then also be unrecoverable, remaining messages in the batch would not be stored.

There might be some more problematic settings, but these are the most obvious ones. This will probably be fine in practice, but could be fixed in the future by reusing the Raft replication logic. Replicated streams don't have any issues during hard kills, and generally replicated streams would be recommended for higher consistency anyway.

Resolves #6974

Signed-off-by: Maurice van Veen [email protected]

@MauriceVanVeen MauriceVanVeen force-pushed the maurice/batch-r1 branch 2 times, most recently from e6c2ab7 to 6ed7801 Compare August 6, 2025 15:55
@MauriceVanVeen MauriceVanVeen marked this pull request as ready for review August 6, 2025 20:54
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner August 6, 2025 20:54
Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just minor things.

Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 0d02332 into main Aug 8, 2025
110 of 114 checks passed
@neilalexander neilalexander deleted the maurice/batch-r1 branch August 8, 2025 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch publish - support R1
2 participants