[SPARK-47383][CORE] Support `spark.shutdown.timeout` config #45504

robreeves · 2024-03-13T21:56:19Z

What changes were proposed in this pull request?

Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property.

Why are the changes needed?

Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners).

This is not easily configurable before this change. The underlying org.apache.hadoop.util.ShutdownHookManager has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it.

Does this PR introduce any user-facing change?

Yes, a new config spark.shutdown.timeout is added.

How was this patch tested?

Manual testing in spark-shell. This behavior is not practical to write a unit test for.

Was this patch authored or co-authored using generative AI tooling?

No

robreeves · 2024-03-14T20:34:13Z

@HyukjinKwon can you take a look please? I'm tagging you since you reviewed the last change in this file.

core/src/main/scala/org/apache/spark/internal/config/package.scala

core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala

dongjoon-hyun · 2024-03-16T00:09:24Z

cc @mridulm

mridulm

Looks good to me.

core/src/main/scala/org/apache/spark/internal/config/package.scala

dongjoon-hyun

+1, LGTM with one comment to make this configuration as internal one.

https://github.com/apache/spark/pull/45504/files#r1527619917

robreeves · 2024-03-18T16:16:15Z

+1, LGTM with one comment to make this configuration as internal one.

https://github.com/apache/spark/pull/45504/files#r1527619917

I made the change. Thanks for the reviews!

dongjoon-hyun

Thank you, @robreeves and @mridulm .
Merged to master for Apache Spark 4.0.0.

### What changes were proposed in this pull request? Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. ### Why are the changes needed? Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. ### Does this PR introduce _any_ user-facing change? Yes, a new config `spark.shutdown.timeout` is added. ### How was this patch tested? Manual testing in spark-shell. This behavior is not practical to write a unit test for. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

pan3793 · 2025-02-18T14:56:40Z

@dongjoon-hyun @mridulm @robreeves, the added configuration is tricky, for example,

$ bin/spark-shell --conf spark.executor.extraJavaOptions="-Dspark.shutdown.timeout=60s"
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-preview2
      /_/

Using Scala version 2.13.14 (OpenJDK 64-Bit Server VM, Java 17.0.13)
Type in expressions to have them evaluated.
Type :help for more information.
25/02/18 22:54:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/02/18 22:54:43 ERROR SparkContext: Error initializing SparkContext.
java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dspark.shutdown.timeout=60s'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.
	at org.apache.spark.SparkConf.$anonfun$validateSettings$5(SparkConf.scala:527)
	at org.apache.spark.SparkConf.$anonfun$validateSettings$5$adapted(SparkConf.scala:522)
        ...

~~instead, we can use --conf spark.hadoop.hadoop.service.shutdown.timeout=60 to override Hadoop configuration by Spark conf without any changes.~~

pan3793 · 2025-02-18T15:03:34Z

cc @cloud-fan as you are the RM of Spark 4.0.0, this might be a potential issue

pan3793 · 2025-02-18T15:18:26Z

Some additional context, in SPARK-51243(#49986), I noticed the restriction of spark.executor.extraJavaOptions when trying to add a new Java system property spark.ml.allowNativeBlas.

I think we do have some special cases like these two that must use Java system property instead of Spark configuration system, we can either give exceptions to those configurations when checking spark.executor.extraJavaOptions or choose a different prefix for those Java properties.

robreeves added 3 commits March 12, 2024 16:19

Added timeout. Still need to test

034eea4

Updated config doc

1131d8f

code comment

cfae485

github-actions bot added the CORE label Mar 13, 2024

fix imports

a7e94d4

dongjoon-hyun reviewed Mar 14, 2024

View reviewed changes

core/src/main/scala/org/apache/spark/internal/config/package.scala Show resolved Hide resolved

dongjoon-hyun reviewed Mar 14, 2024

View reviewed changes

core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala Show resolved Hide resolved

dongjoon-hyun reviewed Mar 14, 2024

View reviewed changes

core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala Show resolved Hide resolved

mridulm approved these changes Mar 17, 2024

View reviewed changes

dongjoon-hyun reviewed Mar 17, 2024

View reviewed changes

core/src/main/scala/org/apache/spark/internal/config/package.scala Show resolved Hide resolved

dongjoon-hyun approved these changes Mar 17, 2024

View reviewed changes

added internal to config

a61a139

dongjoon-hyun approved these changes Mar 18, 2024

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-47383][CORE] Make the shutdown hook timeout configurable~~ [SPARK-47383][CORE] Support spark.shutdown.timeout config Mar 18, 2024

dongjoon-hyun closed this in ce93c9f Mar 18, 2024

pan3793 mentioned this pull request Feb 18, 2025

[SPARK-51243][CORE][ML] Configurable allow native BLAS #49986

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-47383][CORE] Support `spark.shutdown.timeout` config #45504

[SPARK-47383][CORE] Support `spark.shutdown.timeout` config #45504

Uh oh!

robreeves commented Mar 13, 2024 •

edited

Loading

Uh oh!

robreeves commented Mar 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 16, 2024

Uh oh!

mridulm left a comment

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Uh oh!

robreeves commented Mar 18, 2024

Uh oh!

dongjoon-hyun left a comment

Uh oh!

pan3793 commented Feb 18, 2025 •

edited

Loading

Uh oh!

pan3793 commented Feb 18, 2025

Uh oh!

pan3793 commented Feb 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

[SPARK-47383][CORE] Support spark.shutdown.timeout config #45504

[SPARK-47383][CORE] Support spark.shutdown.timeout config #45504

Uh oh!

Conversation

robreeves commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

robreeves commented Mar 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dongjoon-hyun commented Mar 16, 2024

Uh oh!

mridulm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

robreeves commented Mar 18, 2024

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

pan3793 commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pan3793 commented Feb 18, 2025

Uh oh!

pan3793 commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

[SPARK-47383][CORE] Support `spark.shutdown.timeout` config #45504

[SPARK-47383][CORE] Support `spark.shutdown.timeout` config #45504

robreeves commented Mar 13, 2024 •

edited

Loading

pan3793 commented Feb 18, 2025 •

edited

Loading

pan3793 commented Feb 18, 2025 •

edited

Loading