Skip to content

Conversation

CalvinConfluent
Copy link
Contributor

@CalvinConfluent CalvinConfluent commented Nov 26, 2024

If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override.

Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case.

Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want.

@github-actions github-actions bot added the kraft label Nov 26, 2024
@CalvinConfluent CalvinConfluent marked this pull request as ready for review December 3, 2024 17:41
@cmccabe
Copy link
Contributor

cmccabe commented Dec 6, 2024

Thanks for the PR. Thoughts:

I think feature control should not have a reference to cluster control. in general many things reference feature control to know the MV and I don’t want them to have to pull in cluster control for that.

It would probably be better to have ConfigurationControl handle the updateFeatures call, and have it call into FeatureControl.

@cmccabe
Copy link
Contributor

cmccabe commented Dec 6, 2024

Also, I think we should generate the configuration record for the cluster config as part of the activation event rather than right afterwards. That will ensure that it happens before anything else.

@CalvinConfluent
Copy link
Contributor Author

Update Summary:

  1. FeatureControl is referenced by ConfigurationControl.
  2. ConfigurationControl now has UpdateFeaures and is used in the controller.
  3. During the activation, if it is an empty log, a cluster level config record will be generated.
  4. 2 types of config updates are rejected when ELR enabled:
    1. Remove the cluster level min ISR.
    2. Add Broker level min ISR.

@cmccabe
Copy link
Contributor

cmccabe commented Dec 16, 2024

Can you fix the checkstyle errors?

// Also, it will remove all the broker level min ISR config records.
void maybeResetMinIsrConfig(List<ApiMessageAndVersion> outputRecords) {
if (!clusterConfig().containsKey(MIN_IN_SYNC_REPLICAS_CONFIG)) {
String minIsrDefaultConfigValue = configSchema.getStaticOrDefaultConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log a message if we are doing this. Also, it seems easier just to create the configuration record directly than call a function here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, updated.

@cmccabe cmccabe changed the title Kafka-16540 set up the min ISR configs if ELR is enabled. KAFKA-16540: enforce min.insync.replicas config invariants for ELR Jan 7, 2025
@@ -303,6 +339,14 @@ private ApiError validateAlterConfig(ConfigResource configResource,
if (alterConfigPolicy.isPresent()) {
alterConfigPolicy.get().validate(new RequestMetadata(configResource, alteredConfigsForAlterConfigPolicyCheck));
}
if (featureControl.isElrFeatureEnabled()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this part as we have done the filtering above.

return isElrFeatureEnabled(latestFinalizedFeatures().versionOrDefault(EligibleLeaderReplicasVersion.FEATURE_NAME, (short) 0));
}

public static boolean isElrFeatureEnabled(short elrFeatureLevel) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this static method, no one uses it any more.

@cmccabe cmccabe merged commit ec49a60 into apache:trunk Jan 8, 2025
9 checks passed
pranavt84 pushed a commit to pranavt84/kafka that referenced this pull request Jan 27, 2025
…pache#17952)

If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override.

Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case.

Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want.

Reviewers: Colin P. McCabe <[email protected]>
CalvinConfluent added a commit to CalvinConfluent/kafka that referenced this pull request Jan 29, 2025
…pache#17952)

If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override.

Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case.

Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want.

Reviewers: Colin P. McCabe <[email protected]>
cmccabe pushed a commit that referenced this pull request Feb 4, 2025
…17952)

If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override.

Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case.

Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want.

Reviewers: Colin P. McCabe <[email protected]>
manoj-mathivanan pushed a commit to manoj-mathivanan/kafka that referenced this pull request Feb 19, 2025
…pache#17952)

If ELR is enabled, we need to set a cluster-level min.insync.replicas, and remove all broker-level overrides. The reason for this is that if brokers disagree about which partitions are under min ISR, it breaks the KIP-966 replication invariants. In order to enforce this, when the eligible.leader.replicas.version feature is turned on, we automatically remove all broker-level min.insync.replicas overrides, and create the required cluster-level override if needed. Similarly, if the cluster was created with eligible.leader.replicas.version enabled, we create a similar cluster-level record. In both cases, we don't allow setting overrides for individual brokers afterwards, or removing the cluster-level override.

Split ActivationRecordsGeneratorTest up into multiple test cases rather than having it be one giant test case.

Fix a bug in QuorumControllerTestEnv where we would replay records manually on objects, racing with the active controller thread. Instead, we should simply ensure that the initial bootstrap records contains what we want.

Reviewers: Colin P. McCabe <[email protected]>
chia7712 pushed a commit that referenced this pull request Aug 4, 2025
Along with the change: #17952

([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)),
the semantics of `min.insync.replicas` config has small change, and add
some constraints. We should document them clearly.

Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>,
 Mickael Maison <[email protected]>, Paolo Patierno
 <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping
 Tsai <[email protected]>
chia7712 pushed a commit that referenced this pull request Aug 4, 2025
Along with the change: #17952

([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)),
the semantics of `min.insync.replicas` config has small change, and add
some constraints. We should document them clearly.

Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>,
 Mickael Maison <[email protected]>, Paolo Patierno
 <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping
 Tsai <[email protected]>
airlock-confluentinc bot pushed a commit to confluentinc/kafka that referenced this pull request Aug 6, 2025
Along with the change: apache#17952

([KIP-966](https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas)),
the semantics of `min.insync.replicas` config has small change, and add
some constraints. We should document them clearly.

Reviewers: Jun Rao <[email protected]>, Calvin Liu <[email protected]>,
 Mickael Maison <[email protected]>, Paolo Patierno
 <[email protected]>, Federico Valeri <[email protected]>, Chia-Ping
 Tsai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants