-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
(2.12) NRG: Empty log protection #7038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a1aacee
to
be29939
Compare
3247572
to
1a91162
Compare
1a91162
to
cc5f1ff
Compare
Just realized, this also makes replicated in-memory streams a lot safer to use. (Since they don't have any stable storage guarantees at all) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just one thing.
Log WAL | ||
Track bool | ||
Observer bool | ||
Recovering bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add comments explaining the Recovering
and ScaleUp
side effects here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Recovering must be set for a Raft group that's recovering after a restart, or if it's
// first seen after a catchup from another server. If a server recovers with an empty log,
// we know to protect against data loss.
Recovering bool
// ScaleUp identifies the Raft peer set is being scaled up.
// We need to protect against losing state due to the new peers starting with an empty log.
// Therefore, these empty servers can't try to become leader until they at least have _some_ state.
ScaleUp bool
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
cc5f1ff
to
374d3cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
When a follower comes up with an entirely empty disk, it can vote for any server to become leader which could result in data loss. For example with a R3 stream. Server A is leader and has stored all JetStream messages. Server B was part of quorum, but didn't apply all JetStream messages yet. Server C has stored some JetStream messages, but it didn't get the messages that had quorum on Server A and Server B.
If we'd shutdown Server B and reset the disk, we would violate stable storage which Raft relies on. If Server A goes down now, Server B can't become leader because it's empty (which is good), but Server B can vote for Server C to become leader. That would be bad, because we'd have lost some writes, and Server A would then have desynced.
By extension, scaling up of streams can be dangerous. Scaling up a R1 stream to R3 will work most of the time, but technically there's a very short time frame where the stream can be assigned on the two other servers, and the previous R1 leader immediately dies right after. Now the remaining servers that got assigned to be part of the R3 stream can vote for each other to become leader, which will lose the full stream contents.
This PR proposes to add protection and special handling when a server's log is empty.
ScaleUp: true
set. This ensures servers that are new and part of the scale up go into observer mode. This ensures they can't become leader, but relaxes the above "empty vote" check. This ensures a candidate can still become leader based on normal quorum, and doesn't require votes from all servers.Like I mentioned before, technically resetting a disk is already a violation of Raft, so we could call it a day there. But I feel like it's still a good thing to at least try to attempt to preserve the data, even in such a nightmarish scenario.
TL;DR
Signed-off-by: Maurice van Veen [email protected]