[CASSANDRA-20804][trunk] Optimize DataPlacement lookup by ReplicationParams #4282

netudima · 2025-07-30T18:46:25Z

Avoid double lookup of the same DataPlacement in forNonLocalStrategyTokenRead and forNonLocalStrategyTokenWrite methods Memorize hashCode value
Deduplicate ReplicationParams to use the same objects in DataPlacements and KeyspaceMetadata to use the fast == path in the equals

Patch by Dmitry Konstantinov; reviewed by TBD for CASSANDRA-20804

smiklosovic · 2025-08-04T07:54:17Z

src/java/org/apache/cassandra/cql3/statements/schema/CreateKeyspaceStatement.java

+        // as we have as keys in metadata.placements to have a fast map lookup
+        // ReplicationParams are immutable, so it is a safe optimization
+        KeyspaceParams keyspaceParams = keyspaceMetadata.params;
+        ReplicationParams replicationParams = metadata.placements.deduplicateReplicationParams(keyspaceParams.replication);


Is this logic really necessary? What we started to do is withSwapped which is creating new object new KeyspaceParams and new KeyspaceMetadata etc ...

So one step forward but also some step back as we allocate ...

What if "the deduplication" was done directly upon KeyspaceMetadata.create? We do attrs.asNewKeyspaceParams just so we do further transformations / deduplications on that. Why would not we put KeyspaceParams into KeyspaceMetadata.create deduplicated already?

Why would not we put KeyspaceParams into KeyspaceMetadata.create deduplicated already?

Yes, agree, it will be more clear and will reduce amount of juggling with the objects (so, the logic will be more readable).

So one step forward but also some step back as we allocate ...

Just to clarify: this method is invoked when we apply schema changes or when we load schema on startup (from TCM log), so this method is not on a hot path to be an optimization target itself and I would not worry a lot about allocation of several extra objects here.
The goal of this deduplication is not to reduce memory usage but to make lookup from metadata.placements (metadata.placements.get(ks.params.replication)) more efficient within a hot path during a plain write.
ReplicationParams is a key for metadata.placements map, so when we do a lookup from it we have to compare ReplicationParams object provided as a key to the get operation and a ReplicationParams object stored in the map. Deduplication utilises a fast path within this equals via == instead of full comparison of inner structures (which is much more expensive).

One thing is that this logic is not applied when we deserialize a ClusterMetadata snapshot, which can happen during replay at startup or when catching up from a peer.
In that case no deduplication is done, so there will still be multiple equivalent ReplicationParams instances in schema. Similarly, the map keys in DataPlacements will be distinct from those instances.
I don't believe this affects the intent of this patch, as the deduplication of read/writeReplicaGroups in the DataPlacement construction is still done (i.e. if reads.equals(writes) so the new reference equality check in ReplicaLayout is still valid.
Of course, the memoized hashcode in ReplicationParams still works as expected so despite the KeyspaceParams and DataPlacements having pointers to different instances, the map lookup is still fine.

So I think the issue is that this deduplication may lead to some confusion if it isn't applied consistently. Deduping the replication params instances during deserialization may be more trouble than it's worth, especially as it's actually the hashcode memoization & this ReplicaGroups deduplication that actually provide the benefit.

Hi @beobal, thank you for the checking the change. Yes, I see, I have checked that the deduplication is applied during an initial creation or altering keyspace, as well as on startup (but only a recent one, so it was a log replay path, not a snapshot path).
Let me re-measure to see the remaining cost of the equals after other changes applied.
Regarding the snapshot logic, am I right that this path is going through: org.apache.cassandra.tcm.ClusterMetadata.Serializer#deserialize ?

Thanks @netudima. Yes, that's right about the deserialize code path.

I've added the deduplication for the snapshot loading case as well.
The cost of the remaining equals logic is about 0.3% of total CPU (the test load is CPU-bound, so it corresponds to the real 0.3% of CPU).
While it may look as a small amount the problem is that the majority of our hot write path is composed of such small fragments, each one take a little, but in sum they consume a lot of CPU, so if we want to improve overall performance we have to deal with such small things.
So, my suggestion is to add the deduplication, the logic is straightforward and I do not see particular risks here to make it less stable.

Avoid double lookup of the same DataPlacement in forNonLocalStrategyTokenRead and forNonLocalStrategyTokenWrite methods Memorize hashCode value Deduplicate ReplicationParams to use the same objects in DataPlacements and KeyspaceMetadata to use the fast == path in the equals Patch by Dmitry Konstantinov; reviewed by TBD for CASSANDRA-20804

… identify pending endpoints) Patch by Dmitry Konstantinov; reviewed by TBD for CASSANDRA-20804

…ping is not needed Patch by Dmitry Konstantinov; reviewed by Štefan Miklošovič for CASSANDRA-20804

Patch by Dmitry Konstantinov; reviewed by Štefan Miklošovič, Sam Tunnicliffe for CASSANDRA-20804

netudima force-pushed the CASSANDRA-20804-trunk branch from 00bcffd to ab95a68 Compare July 30, 2025 18:51

smiklosovic reviewed Aug 4, 2025

View reviewed changes

smiklosovic self-requested a review August 5, 2025 14:44

smiklosovic approved these changes Aug 5, 2025

View reviewed changes

netudima added 4 commits August 8, 2025 14:29

do not search endpoints for a token in a typical write case twice (to…

6542438

… identify pending endpoints) Patch by Dmitry Konstantinov; reviewed by TBD for CASSANDRA-20804

Simplify replicationParams deduplication logic: keyspaceMetadata swap…

53157a7

…ping is not needed Patch by Dmitry Konstantinov; reviewed by Štefan Miklošovič for CASSANDRA-20804

Add deduplication on TCM snapshot loading

30b6d7d

Patch by Dmitry Konstantinov; reviewed by Štefan Miklošovič, Sam Tunnicliffe for CASSANDRA-20804

netudima force-pushed the CASSANDRA-20804-trunk branch from f7a354c to 30b6d7d Compare August 11, 2025 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CASSANDRA-20804][trunk] Optimize DataPlacement lookup by ReplicationParams #4282

[CASSANDRA-20804][trunk] Optimize DataPlacement lookup by ReplicationParams #4282

Uh oh!

netudima commented Jul 30, 2025

Uh oh!

smiklosovic Aug 4, 2025 •

edited

Loading

Uh oh!

netudima Aug 4, 2025 •

edited

Loading

Uh oh!

beobal Aug 8, 2025

Uh oh!

netudima Aug 8, 2025

Uh oh!

beobal Aug 8, 2025

Uh oh!

netudima Aug 11, 2025

Uh oh!

Uh oh!

[CASSANDRA-20804][trunk] Optimize DataPlacement lookup by ReplicationParams #4282

Are you sure you want to change the base?

[CASSANDRA-20804][trunk] Optimize DataPlacement lookup by ReplicationParams #4282

Uh oh!

Conversation

netudima commented Jul 30, 2025

Uh oh!

smiklosovic Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

netudima Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beobal Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

netudima Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

beobal Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

netudima Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smiklosovic Aug 4, 2025 •

edited

Loading

netudima Aug 4, 2025 •

edited

Loading