[FLINK-37294][state] Support state migration between disabling and enabling ttl in HeapKeyedStateBackend #26651

hejufang · 2025-06-08T09:17:24Z

[FLINK-37294][state] Support state migration between disabling and enabling ttl in HeapKeyedStateBackend

What is the purpose of the change

Support state migration between disabling and enabling ttl in HeapKeyedStateBackend

Brief change log

Add migrateTtlValue in AbstractHeapState. When the state TTL switch changes, trigger the migration of state data.

Verifying this change

This change is already covered by existing tests, such as StateBackendMigrationTestBase#testStateMigrationAfterChangingTTLFromEnablingToDisabling and StateBackendMigrationTestBase#testStateMigrationAfterChangingTTLFromDisablingToEnabling.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

flinkbot · 2025-06-08T09:24:14Z

CI report:

82ea4f0 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

davidradl · 2025-06-10T11:10:11Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/AbstractHeapState.java

+            TypeSerializer<SV> newSerializer,
+            TtlTimeProvider ttlTimeProvider) {
+
+        Preconditions.checkArgument(priorSerializer instanceof TtlAwareSerializer);


nit: in the variable names we use prior and new and previous and current. I suggest being consistent;
previousSerializer previousAwareSerializer
currentSerializer currentTtlAwareSerializer

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java

davidradl · 2025-06-10T11:13:13Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java

-                                + previousStateSerializer
-                                + ").");
+            } else if (stateCompatibility.isCompatibleAfterMigration()) {
+                migrateStateValues(stateDesc, previousStateSerializer, newStateSerializer);


what happens if the state is not compatible, I suggest we should at least log for that case - or should there be an error in that case?

davidradl · 2025-06-10T11:14:08Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java

@@ -299,6 +292,56 @@ private <N, V> StateTable<K, N, V> tryRegisterStateTable(
        return stateTable;
    }

+    /** Only triggering state migration when the state TTL is turned on or off is supported. */


do we need the text is supported - I am not sure what this means.

davidradl · 2025-06-10T11:16:40Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapListState.java

+            TypeSerializer<List<V>> newSerializer,
+            TtlTimeProvider ttlTimeProvider) {
+
+        Preconditions.checkArgument(


it looks like there is suplication of code, can we re arrange the code so the following is not repeated for the different cases

priorSerializer instanceof TtlAwareSerializer.TtlAwareListSerializer); Preconditions.checkArgument( newSerializer instanceof TtlAwareSerializer.TtlAwareListSerializer); TtlAwareSerializer<V, ?> priorTtlAwareElementSerializer = ((TtlAwareSerializer.TtlAwareListSerializer<V>) priorSerializer) .getElementSerializer(); TtlAwareSerializer<V, ?> newTtlAwareElementSerializer = ((TtlAwareSerializer.TtlAwareListSerializer<V>) newSerializer) .getElementSerializer();

xiangyuf · 2025-06-12T16:24:58Z

@hejufang Hi, I did some improvement for your implementation in this PR(#26674). I've also changed this commit as a co-authored commit. PTAL.

hejufang · 2025-06-13T03:36:37Z

@hejufang Hi, I did some improvement for your implementation in this PR(#26674). I've also changed this commit as a co-authored commit. PTAL.

@xiangyuf Thank you for the improvements. I have push the new commit to the current branch, and we can continue to track this PR.

xiangyuf · 2025-06-13T03:42:58Z

Cool, I've closed another PR.

xiangyuf · 2025-06-13T06:28:38Z

@Zakelly Kindly remind for review.

xiangyuf · 2025-06-18T06:54:42Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java

+                    "State should be an AbstractRocksDBState but is " + state);
+        }
+        AbstractHeapState<K, N, V> heapState = (AbstractHeapState<K, N, V>) state;
+        TtlAwareSerializer<V, ?> previousTtlAwareSerializer =


Variable previousTtlAwareSerializer is not used. You can safely delete this.

comment resolved

Zakelly

Thanks for the PR, overall looks good

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java

Zakelly · 2025-06-18T16:09:45Z

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java

+
+        while (iterator.hasNext()) {
+            final StateEntry<K, N, V> entry = iterator.next();
+            stateTable.put(


It there any concurrency issue when performing stateTable.put() over the iteration?
If so, I'd suggest a new inner transform method to in-place update all the value of inner entries

I haven't found any concurrency issues in unit tests or my testing jobs yet. Additionally, I debug the implementation of StateTable.put and implementation of CopyOnWriteStateMap.put, it locates the existing StateMapEntry and modifies its state data without creating a new StateMapEntry or updating modCount.
Although I haven't identified concurrency issues so far, to avoid potential problems, I still changed to using the transform method to update entries. WDYT?

@hejufang Thanks for the update!

I checked your change but I was thinking could we remove the iteration? I mean there is no need we iterate keys out then perform transform one by one. We may introduce an single internal method that could do iterate all kvs and transform, just like applyToAllKeys. WDYT?

@Zakelly Thanks for your suggestion. I've added a transformAll method to traverse all data and perform transformations. Please take a look.

@hejufang I'm afraid the current implementation does not avoid the potential concurrency issue and will achieve low performance.

To be more specific, the CopyOnWriteStateMap$StateEntryIterator will check the expectedModCount in each next() call, but the modCount will update when putEntry() during transform(). So will a ConcurrentModificationException thrown? Correct me if I'm wrong.

In current transformAll implementation, we do entry iteration and then invoke transform, the getMapForKeyGroup will calculate the key group again and get the state map, which is redundant since the entry is iterated from the corresponding map. I'm worried about the performance.

@Zakelly Thank you for your reply. I have reviewed the implementations of StateTable and StateMap and conducted some tests. Here is my answer to your questions.

Will iterating over StateMap using an iterator and performing transform throw a ConcurrentModificationException?
No, it won't. This is because the putEntry method locates an already existing StateMapEntry instead of creating a new one. It returns before updating modCount (see CopyOnWriteStateMap#putEntry at line 399), so when transforming existing data, modCount does not change, and therefore no ConcurrentModificationException is thrown.

Are there other concurrency issues? Yes.
I verified that the following code will iterate over duplicate data:

for (int i = 0; i < keyGroupedStateMaps.length; i++) { Iterator<StateEntry<K, N, S>> iterator = keyGroupedStateMaps[i].iterator(); while (iterator.hasNext()) { StateEntry<K, N, S> entry = iterator.next(); keyGroupedStateMaps[i].transform( entry.getKey(), entry.getNamespace(), value, transformation); } }

This happens because when iterating directly over CopyOnWriteStateMap.iterator(), calling transform internally invokes putEntry, which triggers computeHashForOperationAndDoIncrementalRehash. This causes data to be moved between two tables (primaryTable and incrementalRehashTable). If the iterator is traversing primaryTable when transform triggers rehashing, some data will move to incrementalRehashTable. The iterator may later visit these entries in incrementalRehashTable, causing duplicate processing.
This issue is easy to reproduce by adding more data to StateMap in a unit test. For instance, StateBackendMigrationTestBase.testKeyedValueStateUpgrade, by adding the following code to insert more entries:

for (int i = 0; i < 10000; i++) { backend.setCurrentKey(i); valueState.update(new TestType("test" + i, i * 1000)); }

Running HashMapStateBackendMigrationTest.testStateMigrationAfterChangingTTLFromDisablingToEnabling then results in errors.

Why does using StateTable.iterator() not iterate duplicate entries?
 Because StateTable.iterator() returns a Spliterator. Before traversal, the Spliterator reads the original iterator (StateMap.iterator()) and preloads the data into a cache (java.util.stream.StreamSpliterators.AbstractWrappingSpliterator.buffer). This effectively creates a snapshot of all entries. Therefore, even if subsequent transforms cause data to move, the iterator will not visit duplicate entries.

Considering that recalculating the key group involves some performance overhead, I optimized the code referencing StateTable.iterator() logic: First, read all data before executing any transform, then update the data:

for (StateMap<K, N, S> stateMap : keyGroupedStateMaps) { List<StateEntry<K, N, S>> entries = StreamSupport.stream( Spliterators.spliteratorUnknownSize(stateMap.iterator(), 0), false) .collect(Collectors.toList()); for (StateEntry<K, N, S> entry : entries) { stateMap.transform(entry.getKey(), entry.getNamespace(), value, transformation); } }

Is there any other possible approach? 
I think another possibility is to add a transformAll interface to StateMap, which directly iterates over both primaryTable and incrementalRehashTable inside CopyOnWriteStateMap. However, I'm unsure if this introduces other potential issues, and it would also require adding implementations to CopyOnWriteSkipListStateMap, which could be complex. So I prefer the approach above which I have verified in my test jobs. What do you think?

Thanks for the investigation and update! Current implementation LGTM as it avoids the recalculation of key group as well as the concurrency issue.

…abling ttl in HeapKeyedStateBackend Co-authored-by: hejufang <[email protected]> Co-authored-by: Xiangyu Feng <[email protected]>

hejufang · 2025-07-22T03:10:24Z

@Zakelly kindly remind

Zakelly · 2025-07-23T10:39:24Z

Thanks for the update! I'll take a look this week.

Zakelly

Thanks for the investigation and update! Current implementation LGTM as it avoids the recalculation of key group as well as the concurrency issue.

hejufang force-pushed the FLINK-37294 branch from 7d80aba to d18e0cb Compare June 8, 2025 09:47

hejufang changed the title ~~[FLINK-37294][state] Support state migration between disabling and enabling ttl in RocksDBKeyedStateBackend~~ [FLINK-37294][state] Support state migration between disabling and enabling ttl in HeapKeyedStateBackend Jun 8, 2025

davidradl reviewed Jun 10, 2025

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java Show resolved Hide resolved

davidradl reviewed Jun 10, 2025

View reviewed changes

hejufang force-pushed the FLINK-37294 branch from d18e0cb to c6ccab7 Compare June 13, 2025 03:25

xiangyuf reviewed Jun 18, 2025

View reviewed changes

Zakelly reviewed Jun 18, 2025

View reviewed changes

hejufang force-pushed the FLINK-37294 branch 3 times, most recently from 8a983a7 to 819a153 Compare June 29, 2025 06:45

github-actions bot added community-reviewed PR has been reviewed by the community. and removed community-reviewed PR has been reviewed by the community. labels Jun 30, 2025

github-actions bot added community-reviewed PR has been reviewed by the community. and removed community-reviewed PR has been reviewed by the community. labels Jul 18, 2025

hejufang and others added 5 commits July 21, 2025 21:18

[FLINK-37294][state] Support state migration between disabling and en…

6e288be

…abling ttl in HeapKeyedStateBackend Co-authored-by: hejufang <[email protected]> Co-authored-by: Xiangyu Feng <[email protected]>

fix comment

d886bc9

fix comment

bc711cd

fix comment

3850e35

fix comment

82ea4f0

hejufang force-pushed the FLINK-37294 branch from 30e4b93 to 82ea4f0 Compare July 21, 2025 13:19

github-actions bot added community-reviewed PR has been reviewed by the community. and removed community-reviewed PR has been reviewed by the community. labels Jul 22, 2025

github-actions bot added community-reviewed PR has been reviewed by the community. and removed community-reviewed PR has been reviewed by the community. labels Jul 23, 2025

Zakelly approved these changes Jul 28, 2025

View reviewed changes

Zakelly merged commit 076b8bd into apache:master Jul 28, 2025

[FLINK-37294][state] Support state migration between disabling and enabling ttl in HeapKeyedStateBackend #26651

[FLINK-37294][state] Support state migration between disabling and enabling ttl in HeapKeyedStateBackend #26651

Uh oh!

Conversation

hejufang commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiangyuf commented Jun 12, 2025

Uh oh!

hejufang commented Jun 13, 2025

Uh oh!

xiangyuf commented Jun 13, 2025

Uh oh!

xiangyuf commented Jun 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zakelly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hejufang Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hejufang commented Jul 22, 2025

Uh oh!

Zakelly commented Jul 23, 2025

Uh oh!

Zakelly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hejufang commented Jun 8, 2025 •

edited

Loading

flinkbot commented Jun 8, 2025 •

edited

Loading

hejufang Jun 29, 2025 •

edited

Loading