Skip to content

KAFKA-18583 Fix getPartitionReplicaEndpoints for KRaft #18635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 21, 2025

Conversation

dimitarndimitrov
Copy link
Contributor

Although MetadataCache's getPartitionReplicaEndpoints takes a single topic-partition, the KRaftMetadataCache implementation iterates over all partitions of the matching topic. This is not necessary and can cause significant performance degradation when the topic has a relatively high number of partitions.

Note that this is not a recent regression - it has been a part of KRaftMetadataCache since its creation.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Although MetadataCache's getPartitionReplicaEndpoints takes a
single topic-partition, the KRaftMetadataCache implementation
iterates over all partitions of the matching topic. This is not
necessary and can cause significant performance degradation when
the topic has a relatively high number of partitions.

Note that this is not a recent regression - it has been a part of
KRaftMetadataCache since its creation.
@github-actions github-actions bot added triage PRs from the community core Kafka Broker small Small PRs labels Jan 20, 2025
Copy link
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. I left one comment for consideration.

Comment on lines 569 to 572
// Verify that for each partition we have exactly $replicationFactor endpoints
assertEquals(replicationFactor,
replicaSet.size,
s"Unexpected replica set $replicaSet for partition $partitionId")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to also verify that the method returns the replicas that we expect (replica ids + nodes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, let me know if there's anything you'd like changed in the chosen approach.

@github-actions github-actions bot removed the small Small PRs label Jan 20, 2025
@@ -433,7 +433,7 @@ class KRaftMetadataCache(
val image = _currentImage
val result = new mutable.HashMap[Int, Node]()
Option(image.topics().getTopic(tp.topic())).foreach { topic =>
topic.partitions().values().forEach { partition =>
Option(topic.partitions().get(tp.partition())).foreach { partition =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch - this is a painful oversight.

@@ -511,6 +512,101 @@ class MetadataCacheTest {
assertEquals(initialBrokerIds.toSet, aliveBrokersFromCache.map(_.id).toSet)
}

@ParameterizedTest
@MethodSource(Array("cacheProvider"))
def testGetPartitionReplicaEndpoints(cache: MetadataCache): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we didn't have a test for this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly - there is an FFF integration test verifying that the method doesn't return a shutdown node (eventually).

@github-actions github-actions bot removed the triage PRs from the community label Jan 21, 2025
Copy link
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patch.

@dajac dajac merged commit 31d8e68 into apache:trunk Jan 21, 2025
2 checks passed
dajac pushed a commit that referenced this pull request Jan 21, 2025
Although `MetadataCache`'s `getPartitionReplicaEndpoints` takes a single topic-partition, the `KRaftMetadataCache` implementation iterates over all partitions of the matching topic. This is not necessary and can cause significant performance degradation when the topic has a relatively high number of partitions.

Note that this is not a recent regression - it has been a part of `KRaftMetadataCache` since its creation.

Reviewers: Ismael Juma <[email protected]>, David Jacot <[email protected]>
@dajac
Copy link
Member

dajac commented Jan 21, 2025

Merged it to trunk and to 4.0. @dimitarndimitrov I was not able to cherry-pick it to 3.9 and to 3.8 due to conflicts. Could you please open PRs for those branches?

dimitarndimitrov added a commit to dimitarndimitrov/kafka that referenced this pull request Jan 21, 2025
The cherry-pick required reimplementing the accompanying test to work
with UpdateMetadataRequest (removed in 4.0 and trunk) in order to also
apply to `ZkMetadataCache`. If the removal of UpdateMetadataRequest is
backported here as well, the test can be changed to match the trunk
version.

Conflicts: core/src/test/scala/unit/kafka/server/MetadataCacheTest.scala
@dimitarndimitrov
Copy link
Contributor Author

Merged it to trunk and to 4.0. @dimitarndimitrov I was not able to cherry-pick it to 3.9 and to 3.8 due to conflicts. Could you please open PRs for those branches?

Hey @dajac check out #18657 when you have a moment. The sole commit in that PR also applies cleanly to 3.8 (tested locally).

pranavt84 pushed a commit to pranavt84/kafka that referenced this pull request Jan 27, 2025
Although `MetadataCache`'s `getPartitionReplicaEndpoints` takes a single topic-partition, the `KRaftMetadataCache` implementation iterates over all partitions of the matching topic. This is not necessary and can cause significant performance degradation when the topic has a relatively high number of partitions.

Note that this is not a recent regression - it has been a part of `KRaftMetadataCache` since its creation.

Reviewers: Ismael Juma <[email protected]>, David Jacot <[email protected]>
airlock-confluentinc bot pushed a commit to confluentinc/kafka that referenced this pull request Jan 27, 2025
Although `MetadataCache`'s `getPartitionReplicaEndpoints` takes a single topic-partition, the `KRaftMetadataCache` implementation iterates over all partitions of the matching topic. This is not necessary and can cause significant performance degradation when the topic has a relatively high number of partitions.

Note that this is not a recent regression - it has been a part of `KRaftMetadataCache` since its creation.

Reviewers: Ismael Juma <[email protected]>, David Jacot <[email protected]>
manoj-mathivanan pushed a commit to manoj-mathivanan/kafka that referenced this pull request Feb 19, 2025
Although `MetadataCache`'s `getPartitionReplicaEndpoints` takes a single topic-partition, the `KRaftMetadataCache` implementation iterates over all partitions of the matching topic. This is not necessary and can cause significant performance degradation when the topic has a relatively high number of partitions.

Note that this is not a recent regression - it has been a part of `KRaftMetadataCache` since its creation.

Reviewers: Ismael Juma <[email protected]>, David Jacot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Kafka Broker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants