[SPARK-53076][CORE][TESTS] Disable `SparkBloomFilterSuite` #51788

dongjoon-hyun · 2025-08-02T15:23:00Z

What changes were proposed in this pull request?

This PR aims to disable SparkBloomFilterSuite due to the excessive running time.

SPARK-53077 is filed to re-enable this with the reasonable running time.

Why are the changes needed?

Previously, common/sketch module took less than 10s.

$ mvn package --pl common/sketch
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  7.177 s
[INFO] Finished at: 2025-08-02T08:25:43-07:00
[INFO] ------------------------------------------------------------------------

After SparkBloomFilterSuite was added newly, SparkBloomFilterSuite took over 12 minutes. It's too long as a unit test.

[SPARK-47547][CORE] Add BloomFilter V2 and use it as default #50933

[info] Test org.apache.spark.util.sketch.SparkBloomFilterSuite#testAccuracyRandomDistribution(long, double, int, org.junit.jupiter.api.TestInfo):#1 started
[info] Test org.apache.spark.util.sketch.SparkBloomFilterSuite#testAccuracyEvenOdd(long, double, int, org.junit.jupiter.api.TestInfo):#1 started
[info] Test run finished: 0 failed, 0 ignored, 2 total, 721.939s

Does this PR introduce any user-facing change?

No, this is a test change.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun · 2025-08-02T15:30:42Z

cc @ishnagy , @peter-toth , @LuciferYang

dongjoon-hyun · 2025-08-03T05:39:37Z

Could you review this PR if you have some time, @beliefer ?

LuciferYang · 2025-08-03T06:28:14Z

hope this test can be re-enabled within a short period of time.

LuciferYang · 2025-08-03T06:29:47Z

Merged into master. Thanks @dongjoon-hyun

dongjoon-hyun · 2025-08-03T06:30:49Z

Thank you so much, @LuciferYang . Yes, I hope so too.

peter-toth · 2025-08-03T08:18:30Z

Thanks @dongjoon-hyun.

@ishnagy do you think we could validate V2 filter implementation with less than 1G items added?

beliefer · 2025-08-03T09:07:47Z

Could you review this PR if you have some time, @beliefer ?

Thank you for the ping. Disable the test case looks good to me in a short.

ishnagy · 2025-08-03T10:43:21Z

@ishnagy do you think we could validate V2 filter implementation with less than 1G items added?

the short answer is a (qualified) yes.

the longer answer is a bit more nuanced.

this test was never meant to be run regularly (e.g. as part of a generic pre-merge test suite). I submitted the PR originally having an @Disabled annotations on these tests, for the exact purpose of not to hog any general purpose test runs with these longrunning testcases. Since the sketch module has virtually no external dependencies, it would be perfectly enough to run any heavy duty bloomfilter tests when the actual BF logic is changed (which is expected to happen quite rarely, I assume). There was an attempt to start working out some such test classifications in d9d69801, but that was quickly cut short.

on the necessity of these tests. I'm a bit worried that forcing this test code into the "generic bucket", getting constrained by the time/resource limitations we have there, significantly cutting down the testcases just to meet those "artificial" constraints, we will introduce some similar blind spots as the original scala suite when it only tested with a 100k insertions at most. This particular int/long truncation problem may be detectable with a 100M elements only, but I think the implementations should not only be proven to be safe in the range we can afford to test regularly (kind of pointlessly), but in 1-2orders of magnitudes higher, which we may not be able run on every change, but we can still afford to run once in a while, when the implementation changes.

Short term action item: I'll check if the 100M insertion is enough to demonstrate the problem with the V1 implementation, and prep a PR with a lowered item count (ETA: tomorrow)
Mid term: I would be perfectly fine with leaving these tests on disabled (without removing them), even putting back all the testcases we cut out in the chase of decreasing the running time. I'd be happy to work on some tagging facilities, with which we can prevent hogging down the everyday testing without completely disabling the slower tests.
I'm already working on some performance improvement patches (introducing a non compatible V3 version), which should make the insert/query operations a lot quicker, and thus the high item count tests a lot less painful.

LuciferYang · 2025-08-03T10:53:06Z

Since the sketch module has virtually no external dependencies, it would be perfectly enough to run any heavy duty bloomfilter tests when the actual BF logic is changed (which is expected to happen quite rarely, I assume).

I've always been curious about this point you mentioned. If we routinely mark it as @Disabled, do we have a way to enforce that the submitter or code reviewer manually verifies it when changes occur? If not, when potentially erroneous changes are introduced into a release version, it would result in a poor user experience.

LuciferYang · 2025-08-03T11:13:23Z

@dongjoon-hyun @peter-toth @ishnagy If it indeed requires a large amount of data to reproduce this issue, I have an idea and I'm not sure if everyone agrees:

Add an environment variable, such as VERIFY_SPARK_BF_SUITE.
Replace Disabled with EnabledIfEnvironmentVariable:
```
 -@Disabled("TODO(SPARK-53077): Re-enable with a resonable test time.")
 +@EnabledIfEnvironmentVariable(named = "VERIFY_SPARK_BF_SUITE", matches = "true")
```
This way, we can ensure the test is executed by running VERIFY_SPARK_BF_SUITE=true build/mvn package --pl common/sketch, while it will be skipped when running build/mvn package --pl common/sketch or VERIFY_SPARK_BF_SUITE=false build/mvn package --pl common/sketch.
Modify the YAML file for daily tests to ensure that at least one daily test will verify this test.

By doing so, the change pipeline will no longer execute this test, but it will still be verified during the daily test process. This way, even if there are change-related errors, we can detect and correct them within at most one day.

dongjoon-hyun · 2025-08-03T15:19:47Z

Thank you all.

I support all approaches to improve this situation including

the fundamental performance improvements,
reducing test data size,
introducing the environment variable because it's better than hard-coded one although we need to rename it to have SPARK_ prefix like SPARK_GENERATE_BENCHMARK_FILES

dongjoon-hyun · 2025-08-03T15:28:56Z

For the record, the following is ASF Infra Policy and why this is important.

ASF INFRA POLICY

https://infra.apache.org/github-actions-policy.html

MONITORING

https://infra-reports.apache.org/#ghactions&project=spark&hours=168

SPARK EFFORT

SPARK-48094 Reduce GitHub Action usage according to ASF project allowance

ishnagy · 2025-08-04T15:38:07Z

I made some measurements with the V1 implementation:

100M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.050257 %) [00m18s] T:  ~9.6%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.053887 %) [00m09s] T:  ~9.3%

150M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.080157 %) [00m28s] T: ~15.0%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.079987 %) [00m15s] T: ~15.4%

200M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.861257 %) [00m37s] T: ~19.8%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.860424 %) [00m20s] T: ~20.6%

250M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.676172 %) [00m47s] T: ~25.1%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.675387 %) [00m25s] T: ~25.8%

300M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (3.210548 %) [00m57s] T: ~30.5%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (3.209847 %) [00m30s] T: ~30.1%

350M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (5.377388 %) [01m07s] T: ~35.8%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (5.377483 %) [00m36s] T: ~37.1%

400M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (8.170380 %) [01m17s] T: ~41.2%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (8.170716 %) [00m40s] T: ~41.2%

500M
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (15.392861 %) [01m36s] T: ~51.3%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (15.391692 %) [00m50s] T: ~51.5%

1G
testAccuracyRandomDistribution: acceptableFpp(3.000000 %) < actualFpp (59.890330 %) [03m07s] T: 100.0%
testAccuracyEvenOdd:            acceptableFpp(3.000000 %) < actualFpp (59.888499 %) [01m37s] T: 100.0%

I'd say that the truncation bug (with this FPP level) does not cause reliably detectable discrepancies under 350M insertions.

The running time scales approx linearly with the insertion count, so it would mean that with the bare minimum of insertions we could reduce the running time by about a factor of three. It's not nothing (12min -> 4min on github), but 4minutes are still 1-2 orders of magnitudes longer than the old running time.

I saw #51806 went in recently, I can create a patch on top of it. Let me know, if you think it still makes sense.

@dongjoon-hyun
@LuciferYang
@peter-toth

dongjoon-hyun · 2025-08-04T21:03:22Z

Thank you so much, @ishnagy .

It sounds good to me. 4min will be covered in non-ansi-build test pipeline.

ishnagy · 2025-08-05T08:54:35Z

just for recordkeeping, linking the related PRs together,
I have submitted #51845 to reduce the insertion count.

[SPARK-53076][CORE][TESTS] Disable SparkBloomFilterSuite

1ad1507

github-actions bot added the SQL label Aug 2, 2025

LuciferYang approved these changes Aug 3, 2025

View reviewed changes

LuciferYang closed this in 36a77f8 Aug 3, 2025

dongjoon-hyun deleted the SPARK-53076 branch August 3, 2025 06:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53076][CORE][TESTS] Disable `SparkBloomFilterSuite` #51788

[SPARK-53076][CORE][TESTS] Disable `SparkBloomFilterSuite` #51788

Uh oh!

dongjoon-hyun commented Aug 2, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Aug 2, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025

Uh oh!

peter-toth commented Aug 3, 2025

Uh oh!

beliefer commented Aug 3, 2025

Uh oh!

ishnagy commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025 •

edited

Loading

Uh oh!

ishnagy commented Aug 4, 2025

Uh oh!

dongjoon-hyun commented Aug 4, 2025 •

edited

Loading

Uh oh!

ishnagy commented Aug 5, 2025

Uh oh!

Uh oh!

[SPARK-53076][CORE][TESTS] Disable SparkBloomFilterSuite #51788

[SPARK-53076][CORE][TESTS] Disable SparkBloomFilterSuite #51788

Uh oh!

Conversation

dongjoon-hyun commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun commented Aug 2, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025

Uh oh!

peter-toth commented Aug 3, 2025

Uh oh!

beliefer commented Aug 3, 2025

Uh oh!

ishnagy commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

LuciferYang commented Aug 3, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025

Uh oh!

dongjoon-hyun commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ishnagy commented Aug 4, 2025

Uh oh!

dongjoon-hyun commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ishnagy commented Aug 5, 2025

Uh oh!

Uh oh!

[SPARK-53076][CORE][TESTS] Disable `SparkBloomFilterSuite` #51788

[SPARK-53076][CORE][TESTS] Disable `SparkBloomFilterSuite` #51788

dongjoon-hyun commented Aug 2, 2025 •

edited

Loading

dongjoon-hyun commented Aug 3, 2025 •

edited

Loading

dongjoon-hyun commented Aug 4, 2025 •

edited

Loading