(2.12) NRG: Empty log protection #7038

MauriceVanVeen · 2025-07-04T20:18:49Z

When a follower comes up with an entirely empty disk, it can vote for any server to become leader which could result in data loss. For example with a R3 stream. Server A is leader and has stored all JetStream messages. Server B was part of quorum, but didn't apply all JetStream messages yet. Server C has stored some JetStream messages, but it didn't get the messages that had quorum on Server A and Server B.

If we'd shutdown Server B and reset the disk, we would violate stable storage which Raft relies on. If Server A goes down now, Server B can't become leader because it's empty (which is good), but Server B can vote for Server C to become leader. That would be bad, because we'd have lost some writes, and Server A would then have desynced.

By extension, scaling up of streams can be dangerous. Scaling up a R1 stream to R3 will work most of the time, but technically there's a very short time frame where the stream can be assigned on the two other servers, and the previous R1 leader immediately dies right after. Now the remaining servers that got assigned to be part of the R3 stream can vote for each other to become leader, which will lose the full stream contents.

This PR proposes to add protection and special handling when a server's log is empty.

When a server's disk or log is fully reset, while its log is empty, it will vote using an "empty vote". This is a special kind of vote that the candidate will not count against normal quorum. Instead, empty votes can only be used to become leader if the candidate received votes from ALL servers. This protects against data loss even when only a single server hosts the data, as long as all servers can be available at the same time to sort out this reset issue.
When an asset is scaling up, the Raft group will have ScaleUp: true set. This ensures servers that are new and part of the scale up go into observer mode. This ensures they can't become leader, but relaxes the above "empty vote" check. This ensures a candidate can still become leader based on normal quorum, and doesn't require votes from all servers.
Also, when an asset is initialized the first time the "empty vote" check is also relaxed. Which ensures if you create a R3 stream/consumer, you can still rely solely on quorum and don't need all servers to be available at the same time.

Like I mentioned before, technically resetting a disk is already a violation of Raft, so we could call it a day there. But I feel like it's still a good thing to at least try to attempt to preserve the data, even in such a nightmarish scenario.

TL;DR

meta/stream/consumer still require a quorum to store data
meta/stream/consumer still require a quorum to be voted as leader
only if a disk is fully reset, would meta/stream/consumer be put into a special mode that uses "empty votes", which ensures data can't be lost as long as at least 1 server has the data, and ALL servers are temporarily available at the same time to sort out this issue. After which we'd rely on quorum again, as usual.

Signed-off-by: Maurice van Veen [email protected]

MauriceVanVeen · 2025-07-10T09:09:07Z

Just realized, this also makes replicated in-memory streams a lot safer to use. (Since they don't have any stable storage guarantees at all)
Before this PR a 3-node cluster with a R3 memory-stream is only guaranteed to not lose data if not more than one server is down at the same time. After this change, as long as one server still holds the data and can replicate it.. two servers could be restarted and the data would still be safe.

neilalexander

Overall LGTM, just one thing.

neilalexander · 2025-07-10T12:40:00Z

server/raft.go

+	Log        WAL
+	Track      bool
+	Observer   bool
+	Recovering bool


Can you please add comments explaining the Recovering and ScaleUp side effects here?

Done

// Recovering must be set for a Raft group that's recovering after a restart, or if it's // first seen after a catchup from another server. If a server recovers with an empty log, // we know to protect against data loss. Recovering bool // ScaleUp identifies the Raft peer set is being scaled up. // We need to protect against losing state due to the new peers starting with an empty log. // Therefore, these empty servers can't try to become leader until they at least have _some_ state. ScaleUp bool

Signed-off-by: Maurice van Veen <[email protected]>

neilalexander

LGTM!

MauriceVanVeen force-pushed the maurice/scaleup-desync branch from a1aacee to be29939 Compare July 7, 2025 14:11

MauriceVanVeen changed the title ~~(2.12) NRG: Empty log & partial catchup protection~~ (2.12) NRG: Empty log protection Jul 7, 2025

MauriceVanVeen force-pushed the maurice/scaleup-desync branch 2 times, most recently from 3247572 to 1a91162 Compare July 8, 2025 08:24

MauriceVanVeen marked this pull request as ready for review July 8, 2025 10:59

MauriceVanVeen requested a review from a team as a code owner July 8, 2025 10:59

MauriceVanVeen force-pushed the maurice/scaleup-desync branch from 1a91162 to cc5f1ff Compare July 8, 2025 15:20

neilalexander reviewed Jul 10, 2025

View reviewed changes

MauriceVanVeen added 2 commits July 10, 2025 14:53

(2.12) NRG: Empty log protection

bb0f673

Signed-off-by: Maurice van Veen <[email protected]>

De-flake TestJetStreamClusterConsumerInfoAfterCreate

374d3cc

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen force-pushed the maurice/scaleup-desync branch from cc5f1ff to 374d3cc Compare July 10, 2025 12:53

neilalexander approved these changes Jul 10, 2025

View reviewed changes

neilalexander merged commit d72665c into main Jul 10, 2025
65 of 70 checks passed

neilalexander deleted the maurice/scaleup-desync branch July 10, 2025 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

(2.12) NRG: Empty log protection #7038

(2.12) NRG: Empty log protection #7038

Uh oh!

MauriceVanVeen commented Jul 4, 2025 •

edited

Loading

Uh oh!

MauriceVanVeen commented Jul 10, 2025

Uh oh!

neilalexander left a comment

Uh oh!

neilalexander Jul 10, 2025

Uh oh!

MauriceVanVeen Jul 10, 2025

Uh oh!

neilalexander left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

(2.12) NRG: Empty log protection #7038

(2.12) NRG: Empty log protection #7038

Uh oh!

Conversation

MauriceVanVeen commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MauriceVanVeen commented Jul 10, 2025

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

neilalexander Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MauriceVanVeen commented Jul 4, 2025 •

edited

Loading