-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-4085] Propagate FetchFailedException when Spark fails to read local shuffle file. #3579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…local shuffle file.
Test build #24087 has started for PR 3579 at commit
|
Test build #24087 has finished for PR 3579 at commit
|
Test PASSed. |
|
||
class ShuffleFaultToleranceSuite extends FunSuite { | ||
|
||
test("[SPARK-4085] hash shuffle manager recovers when local shuffle files get deleted") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for clarification to me, this issue is not specific to hash-shuffles -- you just chose this because it's the clearest to delete files from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea - this was a test case written by @kayousterhout.
It is plausible, though less likely, that the other un-Try'd sections of code such as wrapping with a compressed input stream or deserializer could fail as well. What would happen when this occurs? Does it hang Spark or fail the job? |
Spark fails the job there, which makes sense I think. |
Yeah, cool, just wanted to make sure we didn't enter into some unrecoverable state. |
LGTM |
|
||
test("[SPARK-4085] hash shuffle manager recovers when local shuffle files get deleted") { | ||
val conf = new SparkConf(false) | ||
conf.set("spark.shuffle.manager", "hash") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more follow up here re: Aaron's question: if this is an issue for sort too, make another test that tests the sort shuffle manager?
Test FAILed. |
I rewrote the test case to cover both sort and hash shuffle. |
Test build #24114 has started for PR 3579 at commit
|
Test build #24114 has finished for PR 3579 at commit
|
Test PASSed. |
…local shuffle file. cc aarondav kayousterhout pwendell This should go into 1.2? Author: Reynold Xin <[email protected]> Closes #3579 from rxin/SPARK-4085 and squashes the following commits: 255b4fd [Reynold Xin] Updated test. f9814d9 [Reynold Xin] Code review feedback. 2afaf35 [Reynold Xin] [SPARK-4085] Propagate FetchFailedException when Spark fails to read local shuffle file. (cherry picked from commit 1826372) Signed-off-by: Patrick Wendell <[email protected]>
cc @aarondav @kayousterhout @pwendell
This should go into 1.2?