-
Notifications
You must be signed in to change notification settings - Fork 14.6k
KAFKA-16540: enforce min.insync.replicas config invariants for ELR #17952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-16540: enforce min.insync.replicas config invariants for ELR #17952
Conversation
Thanks for the PR. Thoughts: I think feature control should not have a reference to cluster control. in general many things reference feature control to know the MV and I don’t want them to have to pull in cluster control for that. It would probably be better to have ConfigurationControl handle the updateFeatures call, and have it call into FeatureControl. |
Also, I think we should generate the configuration record for the cluster config as part of the activation event rather than right afterwards. That will ensure that it happens before anything else. |
Update Summary:
|
Can you fix the checkstyle errors? |
metadata/src/main/java/org/apache/kafka/controller/ActivationRecordsGenerator.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java
Outdated
Show resolved
Hide resolved
metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java
Outdated
Show resolved
Hide resolved
// Also, it will remove all the broker level min ISR config records. | ||
void maybeResetMinIsrConfig(List<ApiMessageAndVersion> outputRecords) { | ||
if (!clusterConfig().containsKey(MIN_IN_SYNC_REPLICAS_CONFIG)) { | ||
String minIsrDefaultConfigValue = configSchema.getStaticOrDefaultConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should log a message if we are doing this. Also, it seems easier just to create the configuration record directly than call a function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, updated.
metadata/src/main/java/org/apache/kafka/controller/ConfigurationControlManager.java
Outdated
Show resolved
Hide resolved
@@ -303,6 +339,14 @@ private ApiError validateAlterConfig(ConfigResource configResource, | |||
if (alterConfigPolicy.isPresent()) { | |||
alterConfigPolicy.get().validate(new RequestMetadata(configResource, alteredConfigsForAlterConfigPolicyCheck)); | |||
} | |||
if (featureControl.isElrFeatureEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this part as we have done the filtering above.
return isElrFeatureEnabled(latestFinalizedFeatures().versionOrDefault(EligibleLeaderReplicasVersion.FEATURE_NAME, (short) 0)); | ||
} | ||
|
||
public static boolean isElrFeatureEnabled(short elrFeatureLevel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this static method, no one uses it any more.
…pache#17952) If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override. Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case. Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want. Reviewers: Colin P. McCabe <[email protected]>
…pache#17952) If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override. Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case. Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want. Reviewers: Colin P. McCabe <[email protected]>
…17952) If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override. Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case. Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want. Reviewers: Colin P. McCabe <[email protected]>
…pache#17952) If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override. Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case. Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want. Reviewers: Colin P. McCabe <[email protected]>
Along with the change: #17952 ([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)), the semantics of `min.insync.replicas` config has small change, and add some constraints. We should document them clearly. Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>, Mickael Maison <[email protected]>, Paolo Patierno <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping Tsai <[email protected]>
Along with the change: #17952 ([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)), the semantics of `min.insync.replicas` config has small change, and add some constraints. We should document them clearly. Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>, Mickael Maison <[email protected]>, Paolo Patierno <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping Tsai <[email protected]>
Along with the change: apache#17952 ([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)), the semantics of `min.insync.replicas` config has small change, and add some constraints. We should document them clearly. Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>, Mickael Maison <[email protected]>, Paolo Patierno <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping Tsai <[email protected]>
If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override.
Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case.
Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want.