Skip to content

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Aug 4, 2025

What changes were proposed in this pull request?

This pr adds an environment variable named SPARK_TEST_SPARK_BLOOM_FILTER_SUITE_ENABLED to control whether the test case SparkBloomFilterSuite is executed. It also ensures that this test case is only run for validation in the daily tests specified in build_non_ansi.yml.

Why are the changes needed?

The SparkBloomFilterSuite requires periodic validation, but due to its excessively long execution time (over 10 minutes), it is not suitable for execution in the Change Pipeline.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual verification:

  • maven
build/mvn package --pl common/sketch
[INFO] Running org.apache.spark.util.sketch.SparkBloomFilterSuite
[WARNING] Tests run: 2, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 0.001 s -- in org.apache.spark.util.sketch.SparkBloomFilterSuite
SPARK_TEST_SPARK_BLOOM_FILTER_SUITE_ENABLED=true build/mvn package --pl common/sketch
[INFO] Running org.apache.spark.util.sketch.SparkBloomFilterSuite
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 401.9 s -- in org.apache.spark.util.sketch.SparkBloomFilterSuite
  • sbt
build/sbt clean "sketch/test"
[info] Test run started (JUnit Jupiter)
[info] Test org.apache.spark.util.sketch.SparkBloomFilterSuite ignored: Environment variable [SPARK_TEST_SPARK_BLOOM_FILTER_SUITE_ENABLED] does not exist
[info] Test run finished: 0 failed, 0 ignored, 0 total, 0.016s
SPARK_TEST_SPARK_BLOOM_FILTER_SUITE_ENABLED=true build/sbt clean "sketch/test"
[info] Test run started (JUnit Jupiter)
[info] Test org.apache.spark.util.sketch.SparkBloomFilterSuite#testAccuracyRandomDistribution(long, double, int, org.junit.jupiter.api.TestInfo):#1 started
[info] Test org.apache.spark.util.sketch.SparkBloomFilterSuite#testAccuracyEvenOdd(long, double, int, org.junit.jupiter.api.TestInfo):#1 started
[info] Test run finished: 0 failed, 0 ignored, 2 total, 359.776s

Was this patch authored or co-authored using generative AI tooling?

No

@@ -33,7 +34,7 @@
import java.util.stream.LongStream;
import java.util.stream.Stream;

@Disabled("TODO(SPARK-53077): Re-enable with a resonable test time.")
@EnabledIfEnvironmentVariable(named = "SPARK_TEST_SPARK_BF_SUITE_ENABLED", matches = "true")
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this can be a role model for this kind of environment variable group, can we have a rule?

For example, if we have a rule SPARK_TEST_${CLASSNAME}_ENABLED, this should be SPARK_TEST_SPARK_BLOOM_FILTER_SUITE_ENABLED.

Although BF is reasonable, it looks like not a rule-based name from the suite name.

@@ -41,7 +41,8 @@ jobs:
"PYTHON_TO_TEST": "python3.11",
"SKIP_MIMA": "true",
"SKIP_UNIDOC": "true",
"DEDICATED_JVM_SBT_TESTS": "org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV1Suite,org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV2Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV1Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV2Suite"
"DEDICATED_JVM_SBT_TESTS": "org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV1Suite,org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormatV2Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV1Suite,org.apache.spark.sql.execution.datasources.orc.OrcSourceV2Suite",
"SPARK_TEST_SPARK_BF_SUITE_ENABLED": "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we remove this? Since this is a sketch module, I guess build_non_ansi.yml is enough because it's irrelevant to ANSI/non-ANSI.

@dongjoon-hyun
Copy link
Member

Thank you for making an improvement, @LuciferYang .

@LuciferYang
Copy link
Contributor Author

@dongjoon-hyun Thank you for your review. I'll make the revisions later.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@LuciferYang LuciferYang changed the title [SPARK-53077] Add an env to control the execution of the `SparkBloomFilterSuite [SPARK-53077] Add an env to control the execution of the SparkBloomFilterSuite Aug 4, 2025
@LuciferYang LuciferYang changed the title [SPARK-53077] Add an env to control the execution of the SparkBloomFilterSuite [SPARK-53077][CORE][TESTS][INFRA] Add an env to control the execution of the SparkBloomFilterSuite Aug 4, 2025
@LuciferYang LuciferYang changed the title [SPARK-53077][CORE][TESTS][INFRA] Add an env to control the execution of the SparkBloomFilterSuite [SPARK-53077][CORE][TESTS] Add an env to control the execution of the SparkBloomFilterSuite Aug 4, 2025
@LuciferYang
Copy link
Contributor Author

Merged into master. Thanks @dongjoon-hyun and @peter-toth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants