Skip to content

Conversation

@LindaSummer
Copy link
Member

Issue

TDigest only cares about weight when merging, so it is possible to have centroids with same mean.
This will lead the centroids overwriting during serialization.

Proposed Changes

Add a seq id to make centroids mean encoding keep same order as the original centroids list without confliction.

Related Issue

#2849

@LindaSummer LindaSummer changed the title bugfix(tdigest): fix tdigest centroid encoding overwrite for same mean fix(tdigest): fix tdigest centroid encoding overwrite for same mean Apr 13, 2025
@PragmaTwice PragmaTwice requested a review from mapleFU April 14, 2025 02:04
@mapleFU
Copy link
Member

mapleFU commented Apr 15, 2025

Sorry for delaying, I'm quite busy this week, will take a round tonight

PutDouble(&sub_key, centroid.mean); // It uses EncodeDoubleToUInt64 and keeps original order of double
// The tdigest centroids only cares about the weight rather than the mean, so different centroids may have same mean,
// we should keep them with same original order, this seq id could be discarded in decode
PutFixed32(&sub_key, seq);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Fixed16 enough for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

Thanks very much for your kind review!😊

I thought about the size when adding this seq-id.
The compression is uint32 now, so the limitation of centroids is uint32 maximum.

So I decided to use uint32 for safety.

Best Regards,
Edward

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
2 Security Hotspots

See analysis details on SonarQube Cloud

@PragmaTwice PragmaTwice changed the title fix(tdigest): fix tdigest centroid encoding overwrite for same mean fix(tdigest): fix centroid encoding overwrite for same mean Apr 16, 2025
@PragmaTwice PragmaTwice merged commit 18e6674 into apache:unstable Apr 16, 2025
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants