[FLINK-38453] Add full splits to KafkaSourceEnumState by AHeise · Pull Request #192 · apache/flink-connector-kafka

AHeise · 2025-09-30T13:50:49Z

KafkaEnumerator's state contains the TopicPartitions only but not the offsets, so it doesn't contain the full split state contrary to the design intent.

There are a couple of issues with that approach. It implicitly assumes that splits are fully assigned to readers before the first checkpoint. Else the enumerator will invoke the offset initializer again on recovery from such a checkpoint leading to inconsistencies (LATEST may be initialized during the first attempt for some partitions and initialized during second attempt for others).

Through addSplitBack callback, you may also get these scenarios later for BATCH which actually leads to duplicate rows (in case of EARLIEST or SPECIFIC-OFFSETS) or data loss (in case of LATEST). Finally, it's not possible to safely use KafkaSource as part of a HybridSource because the offset initializer cannot even be recreated on recovery.

All cases are solved by also retaining the offset in the enumerator state. To that end, this commit merges the async discovery phases to immediately initialize the splits from the partitions. Any subsequent checkpoint will contain the proper start offset.

Savonitar · 2025-09-30T18:13:26Z

+                topicPartitions.add(
+                        new KafkaPartitionSplit(
+                                new TopicPartition(TOPIC_PREFIX + readerId, partition),
+                                STARTING_OFFSET));


Thanks for the PR. This is a very good improvement for the connector.
I noticed that the current test creates splits using the constant KafkaPartitionSplit.EARLIEST_OFFSET, would it make sense to add a test case that uses a real-world offset (e.g., 123)?

I had to change the logic a bit and introduced a new special value MIGRATED against which all unit tests now go. However, I also added a test with specific offsets to KafkaSourceEnumeratorTest.

Savonitar · 2025-09-30T18:24:00Z

-    public void testAddSplitsBack() throws Throwable {
+    @ParameterizedTest
+    @EnumSource(StandardOffsetsInitializer.class)
+    public void testAddSplitsBack(StandardOffsetsInitializer offsetsInitializer) throws Throwable {


Is my understanding correct that the test verifies that the offset is correctly recalculated on recovery, but doesn't verify that the original offset(before the failure) was preserved and restored?

Good catch. I expanded the test to cover snapshotting.

fapaul

Looks mostly good left some inline comments

fapaul · 2025-10-02T06:57:17Z

+                        new SplitAndAssignmentStatus(
+                                new KafkaPartitionSplit(
+                                        new TopicPartition(topic, partition),
+                                        DEFAULT_STARTING_OFFSET),


Isn't this a behavioral change? Previously the unassigned split would get the starting offset configured by the user on reassignment.

Yes, added a new MIGRATED offset to indicate that this needs to be initialized on recovery.

fapaul

Thanks for addressing the comments 👍 I am only missing a higher level test for the newly added offset migration in the enumerator

fapaul · 2025-10-07T07:13:40Z

+                        migratedPartitions, getOffsetsRetriever());
+        return splitByAssignmentStatus(
+                splits.stream()
+                        .map(splitStatus -> resolveMigratedSplit(splitStatus, startOffsets)));


Nit: The flow of extracting the migratedPartitions is overly complex because we extract the migrated partitions twice in line 179 and line 161.

It's unfortunately necessary by design:

161 extracts the partitions which are used to jointly look up the partition offsets

This is expensive as it uses admin client to contact Kafka cluster

The design of offset initializer is to jointly look up all partitions to have 1 request to Kafka brokers only

Now that we received all offsets, 179 is applying them to the split. It could be a simple map lookup but I decided to add some assertion, so it went into a different method.

fapaul · 2025-10-07T07:18:17Z

-    public void testAddSplitsBack() throws Throwable {
+    @ParameterizedTest
+    @EnumSource(StandardOffsetsInitializer.class)
+    public void testAddSplitsBack(StandardOffsetsInitializer offsetsInitializer) throws Throwable {


Can you also add a test to cover the newly added migration story?

fapaul

LGTM

KafkaEnumerator's state contains the TopicPartitions only but not the offsets, so it doesn't contain the full split state contrary to the design intent. There are a couple of issues with that approach. It implicitly assumes that splits are fully assigned to readers before the first checkpoint. Else the enumerator will invoke the offset initializer again on recovery from such a checkpoint leading to inconsistencies (LATEST may be initialized during the first attempt for some partitions and initialized during second attempt for others). Through addSplitBack callback, you may also get these scenarios later for BATCH which actually leads to duplicate rows (in case of EARLIEST or SPECIFIC-OFFSETS) or data loss (in case of LATEST). Finally, it's not possible to safely use KafkaSource as part of a HybridSource because the offset initializer cannot even be recreated on recovery. All cases are solved by also retaining the offset in the enumerator state. To that end, this commit merges the async discovery phases to immediately initialize the splits from the partitions. Any subsequent checkpoint will contain the proper start offset.

Savonitar · 2025-10-07T15:30:48Z

LGTM

boring-cyborg Bot added the component=Connectors/Kafka label Sep 30, 2025

Savonitar reviewed Sep 30, 2025

View reviewed changes

fapaul self-requested a review October 2, 2025 06:41

fapaul reviewed Oct 2, 2025

View reviewed changes

AHeise force-pushed the FLINK-38453-enum-state branch from 52382f2 to e2ede23 Compare October 7, 2025 07:02

fapaul reviewed Oct 7, 2025

View reviewed changes

fapaul approved these changes Oct 7, 2025

View reviewed changes

AHeise force-pushed the FLINK-38453-enum-state branch from 52cccfe to 3b1dcee Compare October 7, 2025 14:15

AHeise merged commit cb5c5c0 into apache:main Oct 10, 2025
7 checks passed

Conversation

AHeise commented Sep 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fapaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fapaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fapaul left a comment

Choose a reason for hiding this comment

Uh oh!

Savonitar commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants