(2.12) Atomic batch: support R1 stream #7152
Merged
+593
−288
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
R1 streams currently aren't backed by Raft logs, which means custom handling and recovery is required for R1. The layout on disk looks like:
Inflight batches for a stream are stored under a
batches
directory, the data itself is just another file-based stream. The stream name is a hashedNats-Batch-Id
to not store arbitrary strings, and making them fixed-length. These streams support encryption at rest and useAsyncFlush
by default. This allows batches to be written to really quickly, only requiring a flush before committing the batch.To not duplicate much of the batching-specific logic, this has been extracted into a reusable
mset.processJetStreamBatchMsg
that handles both clustered and standalone streams. Batch commits are proposed through Raft for replicated/clustered streams, and R1 streams simply loop over all messages and callmset.processJetStreamMsg
directly after all consistency checks.That leaves recovery of a partially-written batch after a hard kill. This PR resolves that by looking if batching is enabled, and whether the last message in the stream is a batch message that is not the commit itself. We then lookup the batch state on-disk (we haven't removed that yet), and then process and store remaining messages from the batch. Afterward all unused batches are cleaned up.
Because batch recovery, after a hard kill, requires the existence of a last message that needs to represent the last write in that batch. Certain stream settings can be problematic when they remove this last message:
There might be some more problematic settings, but these are the most obvious ones. This will probably be fine in practice, but could be fixed in the future by reusing the Raft replication logic. Replicated streams don't have any issues during hard kills, and generally replicated streams would be recommended for higher consistency anyway.
Resolves #6974
Signed-off-by: Maurice van Veen [email protected]