-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-47383][CORE] Support spark.shutdown.timeout
config
#45504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@HyukjinKwon can you take a look please? I'm tagging you since you reviewed the last change in this file. |
cc @mridulm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM with one comment to make this configuration as internal
one.
I made the change. Thanks for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @robreeves and @mridulm .
Merged to master for Apache Spark 4.0.0.
spark.shutdown.timeout
config
### What changes were proposed in this pull request? Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. ### Why are the changes needed? Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. ### Does this PR introduce _any_ user-facing change? Yes, a new config `spark.shutdown.timeout` is added. ### How was this patch tested? Manual testing in spark-shell. This behavior is not practical to write a unit test for. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun @mridulm @robreeves, the added configuration is tricky, for example,
|
cc @cloud-fan as you are the RM of Spark 4.0.0, this might be a potential issue |
Some additional context, in SPARK-51243(#49986), I noticed the restriction of I think we do have some special cases like these two that must use Java system property instead of Spark configuration system, we can either give exceptions to those configurations when checking |
What changes were proposed in this pull request?
Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property.
Why are the changes needed?
Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners).
This is not easily configurable before this change. The underlying
org.apache.hadoop.util.ShutdownHookManager
has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it.Does this PR introduce any user-facing change?
Yes, a new config
spark.shutdown.timeout
is added.How was this patch tested?
Manual testing in spark-shell. This behavior is not practical to write a unit test for.
Was this patch authored or co-authored using generative AI tooling?
No