Skip to content

[SPARK-47383][CORE] Support spark.shutdown.timeout config #45504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

robreeves
Copy link
Contributor

@robreeves robreeves commented Mar 13, 2024

What changes were proposed in this pull request?

Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property.

Why are the changes needed?

Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners).

This is not easily configurable before this change. The underlying org.apache.hadoop.util.ShutdownHookManager has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it.

Does this PR introduce any user-facing change?

Yes, a new config spark.shutdown.timeout is added.

How was this patch tested?

Manual testing in spark-shell. This behavior is not practical to write a unit test for.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Mar 13, 2024
@robreeves
Copy link
Contributor Author

@HyukjinKwon can you take a look please? I'm tagging you since you reviewed the last change in this file.

@dongjoon-hyun
Copy link
Member

cc @mridulm

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM with one comment to make this configuration as internal one.

@robreeves
Copy link
Contributor Author

+1, LGTM with one comment to make this configuration as internal one.

I made the change. Thanks for the reviews!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @robreeves and @mridulm .
Merged to master for Apache Spark 4.0.0.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-47383][CORE] Make the shutdown hook timeout configurable [SPARK-47383][CORE] Support spark.shutdown.timeout config Mar 18, 2024
sweisdb pushed a commit to sweisdb/spark that referenced this pull request Apr 1, 2024
### What changes were proposed in this pull request?
Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property.

### Why are the changes needed?
Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners).

This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds.  It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it.

### Does this PR introduce _any_ user-facing change?
Yes, a new config `spark.shutdown.timeout` is added.

### How was this patch tested?
Manual testing in spark-shell. This behavior is not practical to write a unit test for.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45504 from robreeves/sc_shutdown_timeout.

Authored-by: Rob Reeves <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@pan3793
Copy link
Member

pan3793 commented Feb 18, 2025

@dongjoon-hyun @mridulm @robreeves, the added configuration is tricky, for example,

$ bin/spark-shell --conf spark.executor.extraJavaOptions="-Dspark.shutdown.timeout=60s"
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-preview2
      /_/

Using Scala version 2.13.14 (OpenJDK 64-Bit Server VM, Java 17.0.13)
Type in expressions to have them evaluated.
Type :help for more information.
25/02/18 22:54:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/02/18 22:54:43 ERROR SparkContext: Error initializing SparkContext.
java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dspark.shutdown.timeout=60s'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.
	at org.apache.spark.SparkConf.$anonfun$validateSettings$5(SparkConf.scala:527)
	at org.apache.spark.SparkConf.$anonfun$validateSettings$5$adapted(SparkConf.scala:522)
        ...

instead, we can use --conf spark.hadoop.hadoop.service.shutdown.timeout=60 to override Hadoop configuration by Spark conf without any changes.

@pan3793
Copy link
Member

pan3793 commented Feb 18, 2025

cc @cloud-fan as you are the RM of Spark 4.0.0, this might be a potential issue

@pan3793
Copy link
Member

pan3793 commented Feb 18, 2025

Some additional context, in SPARK-51243(#49986), I noticed the restriction of spark.executor.extraJavaOptions when trying to add a new Java system property spark.ml.allowNativeBlas.

I think we do have some special cases like these two that must use Java system property instead of Spark configuration system, we can either give exceptions to those configurations when checking spark.executor.extraJavaOptions or choose a different prefix for those Java properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants