Skip to content

Conversation

@SharonIV0x86
Copy link
Contributor

Closes #2794

This PR adds support for TDIGEST.QUANTILE command.

@SharonIV0x86 SharonIV0x86 marked this pull request as ready for review March 30, 2025 17:51
@SharonIV0x86
Copy link
Contributor Author

SharonIV0x86 commented Mar 30, 2025

@LindaSummer I have added the basic structure for the QUANTILE command but as you mentioned in the issue, there seems to be a problem with the locking mechanism or something else.

E20250330 23:50:42.491079 130596879718080 redis_tdigest.cc:524] metadata has 24 merged nodes, but got 4

I tried modifying the quantile function as mentioned in the issue and the previous PRs but the issue persists. Any help is appreciated.

@LindaSummer
Copy link
Member

LindaSummer commented Apr 1, 2025

Hi @SharonIV0x86 ,

Thanks very much for effort!😊

I'll try to test it in local in recent days and will sync if there is any new updates.

Best Regards,
Edward

@SharonIV0x86
Copy link
Contributor Author

Hi @SharonIV0x86 ,

Thanks very much for effort!😊

I'll try to test it in local in recent days and will sync if there is any new updates.

Best Regards, Edward

Works, till then ill try to fiddle around. Cheers.

@LindaSummer
Copy link
Member

Hi @SharonIV0x86 ,

Sorry for delay of updating. 😊

In recent I have some personal affairs which occupied my schedule so maybe a little delay in response.

I'm now working on this ticket and find that there may be some bugs inside it. I will double confirm it and create a PR to fix it if bug exists.

Best Regards,
Edward

@SharonIV0x86
Copy link
Contributor Author

Hi @SharonIV0x86 ,

Sorry for delay of updating. 😊

In recent I have some personal affairs which occupied my schedule so maybe a little delay in response.

I'm now working on this ticket and find that there may be some bugs inside it. I will double confirm it and create a PR to fix it if bug exists.

Best Regards, Edward

No issues, meanwhile i tried looking into it, no matter what i did there seems to be a mismatch between the actual information and the tdigest metadata now one thing i can deduce is that is not in the Quantile function but this seems to be the issue with mergeCurrentBuffer function or some other centroids function being called from the Quantile function.

Or this is related to locks as you suggested.

There is no hurry from my side in fixing this issue but i am also planning to implement the TDIGEST.CDF and TDIGEST.RANK in upcoming weeks and they most likely will depend in this issue.

@LindaSummer
Copy link
Member

Hi @SharonIV0x86 ,

I have tested the command line successfully with removing the lock and announcing the command "write" for quick validation after fixing #2878 .

I will try to find a way to improve the performance with less critical section.

Best Regards,
Edward

@SharonIV0x86
Copy link
Contributor Author

SharonIV0x86 commented Apr 13, 2025

Hi @SharonIV0x86 ,

I have tested the command line successfully with removing the lock and announcing the command "write" for quick validation after fixing #2878 .

I will try to find a way to improve the performance with less critical section.

Best Regards, Edward

Okay, thanks for looking into it. So, how should i proceed then? should i wait for #2878 to be merged?

@LindaSummer
Copy link
Member

Hi @SharonIV0x86 ,
I have tested the command line successfully with removing the lock and announcing the command "write" for quick validation after fixing #2878 .
I will try to find a way to improve the performance with less critical section.
Best Regards, Edward

Okay, thanks for looking into it. So, how should i proceed then? should i wait for #2878 to be merged?

Hi @SharonIV0x86 ,

Maybe we could wait for it to be merged before this PR.
It will affect our go integration test cases.

If you don't mind, I will try to solve the lock issue since I find that it may not be so easy to just add a lock key in current connection lock management.
After this PR solved, we could follow it for other commands like CDF.

Best Regards,
Edward

@SharonIV0x86
Copy link
Contributor Author

Hi @SharonIV0x86 ,

Maybe we could wait for it to be merged before this PR. It will affect our go integration test cases.

If you don't mind, I will try to solve the lock issue since I find that it may not be so easy to just add a lock key in current connection lock management. After this PR solved, we could follow it for other commands like CDF.

Best Regards, Edward

Absolutely no issues whatsoever, there is no hurry i am happy to wait 😄
Let me know if there’s anything I can help with in the meantime.

@SharonIV0x86
Copy link
Contributor Author

@LindaSummer Hi, i have tested the quantile command after the merging of the #2878 and it works as expected.

However, there is a bit of difference in quantile values when compared to the original implementation of redis tdigest.quantile but i believe kvrocks quantile is more accurate as it uses linear interpolation so it should be good.

I have also added a go test case so if any changes are required pls lmk. 👍🏼

Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SharonIV0x86 ,

Thanks very much for your effort. 😊
Left some comments.

Best Regards,
Edward

return {Status::RedisExecErr, s.ToString()};
}
if (values_.empty()) {
return {Status::RedisExecErr, "invalid quantile or empty tdigest"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SharonIV0x86 ,

Sorry for my unclear description for this behavior. 😄

Currently, redis stack returns nan as response for none centroids.
We'd better follow this behavior.

Adding the logic to tdigest ranther than command may be better to keep the command module clean.

After modifying the tdigest, the cpp unit test should also be updated.

Best Regards,
Edward

infoAfterEmptyReset := toTdigestInfo(t, rsp.Val())
require.EqualValues(t, 100, infoAfterEmptyReset.Compression)
})
t.Run("tdigest.quantile with different arguments", func(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SharonIV0x86 ,

We could also add an unordered sequence and an empty tdigest as cases. 😊

Best Regards,
Edward

@SharonIV0x86 SharonIV0x86 requested a review from LindaSummer May 7, 2025 16:38
@PragmaTwice
Copy link
Member

@LindaSummer Would you like to have a look?

@LindaSummer
Copy link
Member

@LindaSummer Would you like to have a look?

Hi @PragmaTwice and @SharonIV0x86 ,

Of course!😊

I will go through this PR later today.

Best Regards,
Edward

Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SharonIV0x86 ,

Thanks very much for your effort! 😊

Left some comments.

Best Regards,
Edward

quantile_strings.push_back(std::to_string(q));
if (!result.has_centroids) {
for (size_t i = 0; i < values_.size(); ++i) {
quantile_strings.emplace_back("nan");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SharonIV0x86 ,

Maybe we could make "nan" a constexpr literal string in this file. 😊
It would also be used in other commands.

Best Regards,
Edward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do this.

if (auto status = dumpCentroids(ctx, ns_key, metadata, &centroids); !status.ok()) {
return status;
}
if (centroids.empty()) {
Copy link
Member

@LindaSummer LindaSummer May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SharonIV0x86 ,

At the first time we load the metadata, we should know this tdigest has no data inside it.
This could be checked before the centroids retrieving.

And it would be better to add cpp unit test for this behavior.

Best Regards,
Edward

Copy link
Contributor Author

@SharonIV0x86 SharonIV0x86 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be clear, after we load the metadata here i add a check to do something like this.

if(metadata.merged_nodes == 0 && metadata.unmerged_nodes == 0){
  //tdigest does not contain any data. so set the flag
}

Correct me if im wrong.

And it would be better to add cpp unit test for this behavior.

I'll try working on this as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be clear, after we load the metadata here i add a check to do something like this.

if(metadata.merged_nodes == 0 && metadata.unmerged_nodes == 0){

  //tdigest does not contain any data. so set the flag

}

Correct me if im wrong.

And it would be better to add cpp unit test for this behavior.

I'll try working on this as well.

Hi @SharonIV0x86 ,

We could directly use total_observations to check the tdigest's elements. 😊

Best Regards,
Edward

@SharonIV0x86 SharonIV0x86 requested a review from LindaSummer May 8, 2025 16:45
@SharonIV0x86
Copy link
Contributor Author

SharonIV0x86 commented May 9, 2025

@LindaSummer I have added the cpp test case, i believe it needs some modifications because in the cpp test case its currently not possible to directly compare the values returned by the quantile function with the "nan" string as the result.quantile vector is returned empty if the tdigest is empty. The behaviors differ in the command implementation.

The closest i could do was to add a assert for the value of result.has_centroids and it should be true and if its found true then it can be assumed that in the Execute function of the command implementation will construct the "nan" strings vector and will return it as a response.

@LindaSummer
Copy link
Member

LindaSummer commented May 9, 2025

@LindaSummer I have added the cpp test case, i believe it needs some modifications because in the cpp test case its currently not possible to directly compare the values returned by the quantile function with the "nan" string as the result.quantile vector is returned empty if the tdigest is empty. The behaviors differ in the command implementation.

The closest i could do was to add a assert for the value of result.has_centroids and it should be true and if its found true then it can be assumed that in the Execute function of the command implementation will construct the "nan" strings vector and will return it as a response.

Hi @SharonIV0x86 ,

You could directly test the flag you have added. 😊
If you want a more clear semantic of the result, maybe std::optional<std::vector<double> is a choice.

And the nan string test could be in integration test.

Best Regards,
Edward

Copy link
Member

@LindaSummer LindaSummer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SharonIV0x86 ,

Thanks very much for your contribution and huge effort! 😄
I have made some small changes on current code.

LGTM.

Best Regards,
Edward

@PragmaTwice PragmaTwice merged commit 86df0a6 into apache:unstable May 16, 2025
34 checks passed
@sonarqubecloud
Copy link

@SharonIV0x86
Copy link
Contributor Author

Hi @SharonIV0x86 ,

Thanks very much for your contribution and huge effort! 😄 I have made some small changes on current code.

LGTM.

Best Regards, Edward

Thanks for the changes you made, the code is much better now with ranges and views.
I have opened a PR for the quantile command in kvrocks-website apache/kvrocks-website#300

@SharonIV0x86 SharonIV0x86 deleted the feat/quantile-command branch August 31, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TDigest: Implement QUANTILE command for TDigest Algorithm

6 participants