Skip to content

KAFKA-18301; Make coordinator records first class citizen #18261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Dec 20, 2024

Conversation

dajac
Copy link
Member

@dajac dajac commented Dec 18, 2024

This patch is the first one in a series to improve how coordinator records are managed. It focuses on making coordinator records first class citizen in the generator.

  • Introduce coordinator-key and coordinator-value in the schema;
  • Introduce apiKey for those. This is done to avoid relying on the version to determine the type.
  • It also allows the generator to enforce some rules: the key cannot use flexible versions, the key must have a single version 0, there must be a key and a value for a given api key, etc.
  • It generates an enum with all the coordinator record types. This is pretty handy in the code.

The patch also updates the group coordinators to use those.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@dajac dajac added the KIP-848 The Next Generation of the Consumer Rebalance Protocol label Dec 18, 2024
@github-actions github-actions bot added build Gradle build or GitHub Actions generator RPC and Record code generator labels Dec 18, 2024
@github-actions github-actions bot added the core Kafka Broker label Dec 18, 2024
@github-actions github-actions bot added the tools label Dec 18, 2024
Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the premise of this PR. The hard-coded "version" numbers for the record types have always been a bit inelegant. This is going to be much nicer.

I'm not convinced by the OffsetCommitV0 thing. I think something you're relying on and benefitting from the fact that OffsetCommitV0 v0 is exactly the same as OffsetCommit v0. Sometimes, you have to convert from the v0 class to the current one. Just seems a bit of a faff. Can the definition of the coordinator-key schemas be flexible enough to allow just OffsetCommit to exist without the separate V0 schema?

@@ -14,7 +14,8 @@
// limitations under the License.

{
"type": "data",
"apiKey": 1,
"type": "coordinator-value",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly a slightly compromised area because of the way that OffsetCommit has already used two key version numbers. I would say that OffsetCommitValueV0 would have valid version 0, and OffsetCommitValue would have valid versions 1-4. In practice, you'll never find an instance of v0 for OffsetCommitValue because it would actually be OffsetCommitValueV0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OffsetCommitValueV0 has definitely only version 0. For OffsetCommitValue, I think that keeping 0+ does not hurt just in case.


if (type == MessageSpecType.COORDINATOR_KEY) {
if (this.apiKey.isEmpty()) {
throw new RuntimeException("The ApiKey must be set for messages " + name + " with type `record-key`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record-key should be coordinator-key.

throw new RuntimeException("The ApiKey must be set for messages " + name + " with type `record-key`");
}
if (!this.validVersions().equals(new Versions((short) 0, ((short) 0)))) {
throw new RuntimeException("The Versions must be set to `0` for messages " + name + " with type `record-key`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

throw new RuntimeException("The Versions must be set to `0` for messages " + name + " with type `record-key`");
}
if (!this.flexibleVersions.empty()) {
throw new RuntimeException("The FlexibleVersions are not supported for messages " + name + " with type `record-key`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


if (type == MessageSpecType.COORDINATOR_VALUE) {
if (this.apiKey.isEmpty()) {
throw new RuntimeException("The ApiKey must be set for messages with type `record-key`");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record-key should be coordinator-value.

@@ -1069,7 +1070,7 @@ object GroupMetadataManager {
* @return key for offset commit message
*/
def offsetCommitKey(groupId: String, topicPartition: TopicPartition): Array[Byte] = {
MessageUtil.toVersionPrefixedBytes(OffsetCommitKey.HIGHEST_SUPPORTED_VERSION,
MessageUtil.toVersionPrefixedBytes(CoordinatorRecordType.OFFSET_COMMIT.id(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't seem like this change alters the schema written/read to/from the log, but i want to confirm that this doesn't break compatibility. should we have a test for this? my question is if there is a bug in one of the record format changes, would that be caught somewhere?

also, internal coordinator topics (txn state) can use this but the difference would be to use a unique api key instead now, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't seem like this change alters the schema written/read to/from the log, but i want to confirm that this doesn't break compatibility. should we have a test for this? my question is if there is a bug in one of the record format changes, would that be caught somewhere?

Let me see if I can some unit tests for this. Otherwise, any regressions should also be caught by upgrade system tests.

also, internal coordinator topics (txn state) can use this but the difference would be to use a unique api key instead now, right?

Right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I did not change the coordinator tests so they still used hard coded record types. This also ensures that we don't introduce wrong ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests for the old group coordinator too. The new one has already good coverage (e.g. GroupCoordinatorRecordSerdeTest).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit tests for the old group coordinator too.

could you point me to where it was added?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core/src/test/scala/unit/kafka/coordinator/group/GroupMetadataManagerTest.scala

@dajac
Copy link
Member Author

dajac commented Dec 19, 2024

I'm not convinced by the OffsetCommitV0 thing. I think something you're relying on and benefitting from the fact that OffsetCommitV0 v0 is exactly the same as OffsetCommit v0. Sometimes, you have to convert from the v0 class to the current one. Just seems a bit of a faff. Can the definition of the coordinator-key schemas be flexible enough to allow just OffsetCommit to exist without the separate V0 schema?

I am not really happy with the OffsetCommit version 0. I considered extending the schema to support a legacy api key but it brings even more complexity. The goal of this refactoring to introduce the enum with all the types and their versions and to have strong semantic verified for the records. Having alternate api keys, makes this part harder. We could make an exception as you suggest but then it does not fit nicely in the enum either. Hence, I went with the separate record.

I would argue that the separate record is not wrong because we actually changed the record type but we kept the same name. I think that we introduced this awkwardness when we transitioned to using the auto-generated records. We could perhaps rename that record to LegacyOffsetCommit to make the separation clearer.

By the way, that legacy record was only used in Apache Kafka 0.8. It is very unlikely to ever see it in 4.0 clusters. I was also considering to just drop it but we would need a small KIP for it. It is too bad that the keep freeze for 4.0 is passed because I suppose that we will need to wait for 5.0 until we can remove it now.

@dajac dajac mentioned this pull request Dec 19, 2024
3 tasks
dajac added a commit that referenced this pull request Dec 19, 2024
While looking at the message formatters in #18261, I have noticed at few incorrect test cases.
* We should not log anything when the record type is unknown because the formatters have clear goals.
* We should not parse the value when the key is null or when the key cannot be parsed. While it works in the tests, in practice, this is wrong because we cannot assume that type of the value if the type of the key is not defined. The key drives the type of the entire record.

Reviewers: Chia-Ping Tsai <[email protected]>
dajac added a commit that referenced this pull request Dec 19, 2024
While looking at the message formatters in #18261, I have noticed at few incorrect test cases.
* We should not log anything when the record type is unknown because the formatters have clear goals.
* We should not parse the value when the key is null or when the key cannot be parsed. While it works in the tests, in practice, this is wrong because we cannot assume that type of the value if the type of the key is not defined. The key drives the type of the entire record.

Reviewers: Chia-Ping Tsai <[email protected]>
Copy link
Contributor

@jeffkbkim jeffkbkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

Copy link
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of instances of V0 remain which I suggest should also be renamed to Legacy. Apart from that nit, looks good to me.

@dajac dajac merged commit d67379c into apache:trunk Dec 20, 2024
9 checks passed
@dajac dajac deleted the coordinator-record-gen branch December 20, 2024 11:16
ijuma added a commit to ijuma/kafka that referenced this pull request Dec 20, 2024
…e-old-protocol-versions

* apache-github/trunk:
  KAFKA-18312: Added entityType: topicName to SubscribedTopicNames in ShareGroupHeartbeatRequest.json (apache#18285)
  HOTFIX: fix incompatible types: Optional<TimestampAndOffset> cannot be converted to Option<TimestampAndOffset> (apache#18284)
  MINOR Fix some test-catalog issues (apache#18272)
  KAFKA-18180: Move OffsetResultHolder to storage module (apache#18100)
  KAFKA-18301; Make coordinator records first class citizen (apache#18261)
  KAFKA-18262 Remove DefaultPartitioner and UniformStickyPartitioner (apache#18204)
  KAFKA-18296 Remove deprecated KafkaBasedLog constructor (apache#18257)
  KAFKA-12829: Remove old Processor and ProcessorSupplier interfaces (apache#18238)
  KAFKA-18292 Remove deprecated methods of UpdateFeaturesOptions (apache#18245)
  KAFKA-12829: Remove deprecated Topology#addProcessor of old Processor API (apache#18154)
  KAFKA-18035, KAFKA-18306, KAFKA-18092: Address TransactionsTest flaky tests (apache#18264)
  MINOR: change the default linger time in the new coordinator (apache#18274)
  KAFKA-18305: validate controller.listener.names is not in inter.broker.listener.name for kcontrollers (apache#18222)
  KAFKA-18207: Serde for handling transaction records (apache#18136)
  KAFKA-13722: Refactor Kafka Streams store interfaces (apache#18243)
  KAFKA-17131: Refactor TimeDefinitions (apache#18241)
  MINOR: Fix MessageFormatters (apache#18266)
  Mark flaky tests for Dec 18, 2024 (apache#18263)
tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025
While looking at the message formatters in apache#18261, I have noticed at few incorrect test cases.
* We should not log anything when the record type is unknown because the formatters have clear goals.
* We should not parse the value when the key is null or when the key cannot be parsed. While it works in the tests, in practice, this is wrong because we cannot assume that type of the value if the type of the key is not defined. The key drives the type of the entire record.

Reviewers: Chia-Ping Tsai <[email protected]>
tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025
This patch is the first one in a series to improve how coordinator records are managed. It focuses on making coordinator records first class citizen in the generator.
* Introduce `coordinator-key` and `coordinator-value` in the schema;
* Introduce `apiKey` for those. This is done to avoid relying on the version to determine the type.
* It also allows the generator to enforce some rules: the key cannot use flexible versions, the key must have a single version `0`, there must be a key and a value for a given api key, etc.
* It generates an enum with all the coordinator record types. This is pretty handy in the code.

The patch also updates the group coordinators to use those.

Reviewers: Jeff Kim <[email protected]>, Andrew Schofield <[email protected]>
dajac added a commit that referenced this pull request Jan 7, 2025
Following #18261, this patch updates the Share Coordinator to use the new record format.

Reviewers: Chia-Ping Tsai <[email protected]>, Andrew Schofield <[email protected]>
manoj-mathivanan pushed a commit to manoj-mathivanan/kafka that referenced this pull request Feb 19, 2025
…#18396)

Following apache#18261, this patch updates the Share Coordinator to use the new record format.

Reviewers: Chia-Ping Tsai <[email protected]>, Andrew Schofield <[email protected]>
jeqo pushed a commit to aiven/inkless that referenced this pull request Feb 27, 2025
While looking at the message formatters in apache/kafka#18261, I have noticed at few incorrect test cases.
* We should not log anything when the record type is unknown because the formatters have clear goals.
* We should not parse the value when the key is null or when the key cannot be parsed. While it works in the tests, in practice, this is wrong because we cannot assume that type of the value if the type of the key is not defined. The key drives the type of the entire record.

Reviewers: Chia-Ping Tsai <[email protected]>
jeqo pushed a commit to aiven/inkless that referenced this pull request Feb 27, 2025
Following apache/kafka#18261, this patch updates the Share Coordinator to use the new record format.

Reviewers: Chia-Ping Tsai <[email protected]>, Andrew Schofield <[email protected]>
jeqo pushed a commit to aiven/inkless that referenced this pull request Feb 28, 2025
While looking at the message formatters in apache/kafka#18261, I have noticed at few incorrect test cases.
* We should not log anything when the record type is unknown because the formatters have clear goals.
* We should not parse the value when the key is null or when the key cannot be parsed. While it works in the tests, in practice, this is wrong because we cannot assume that type of the value if the type of the key is not defined. The key drives the type of the entire record.

Reviewers: Chia-Ping Tsai <[email protected]>
jeqo pushed a commit to aiven/inkless that referenced this pull request Feb 28, 2025
Following apache/kafka#18261, this patch updates the Share Coordinator to use the new record format.

Reviewers: Chia-Ping Tsai <[email protected]>, Andrew Schofield <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Gradle build or GitHub Actions clients core Kafka Broker generator RPC and Record code generator KIP-848 The Next Generation of the Consumer Rebalance Protocol tools
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants