-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-8062] Fix NullPointerException in SparkHadoopUtil.getFileSystemThreadStatistics (branch-1.2) #6618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See https://issues.apache.org/jira/browse/SPARK-8062 for the original stacktrace and investigation. This PR is opened against branch-1.2 because I'm trying to deliver a hotfix to a user running 1.2.1; I'm going to open another patch to fix this in 1.3.x, 1.4.x, and master. While investigating this, I may have discovered other bugs in our input output metrics code, which I'll follow up on in subsequent patches; see https://issues.apache.org/jira/browse/SPARK-8086 for details. /cc @sryza |
On closer investigation, this bug doesn't seem to directly affect Spark 1.3+. Also, SPARK-8086 appears to be a false alarm due to some issues in test code. I'd still like to follow up with some documentation and cleanup in this code, but it's a much lower priority since I no longer think that there's a bug here. |
Alright, updated the patch to have a smaller regression test and to remove the incorrect SPARK-8086 discussion. |
Test build #34112 has finished for PR 6618 at commit
|
Seq.empty | ||
} else { | ||
FileSystem.getAllStatistics | ||
.filter { stats => scheme.equals(stats.getScheme()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit: be consistent with parentheses for methods with empty args (getAllStatistics missing them, getScheme and toUri have them)
Tiny nit: do you think (scheme.equals(_.getScheme()))
would be less clear?
Test build #34114 has finished for PR 6618 at commit
|
The test failures are unrelated as far as I can tell. LGTM. |
Yeah, branch-1.2's Jenkins tests are in a bit of a flaky state right now. I'm going to let the style nit slide and will merge this now since I've received confirmation from the user that it fixes their issue. |
…mThreadStatistics (branch-1.2) This patch adds a regression test for an extremely rare bug where `SparkHadoopUtil.getFileSystemThreadStatistics` would fail with a `NullPointerException` if the Hadoop `FileSystem.statisticsTable` contained a `Statistics` entry without a schema. I'm not sure exactly how Hadoop gets into such a state, but this patch's regression test forces that state in order to reproduce this bug. The fix is to add additional null-checking. I debated adding an additional try-catch block around this entire metrics code to just ignore exceptions and keep going in the case of errors, but decided against that approach for now because it seemed overly conservative and might mask other bugs. We can revisit this in followup patches. Author: Josh Rosen <[email protected]> Closes #6618 from JoshRosen/SPARK-8062-branch-1.2 and squashes the following commits: 652fa3c [Josh Rosen] Re-name test and reapply fix 66fc600 [Josh Rosen] Fix and minimize regression test (verified that it still fails) 1d8d125 [Josh Rosen] Fix SPARK-8062 with additional null checks b6430f0 [Josh Rosen] Add failing regression test for SPARK-8062
…tics (branch-1.2) This patch adds a regression test for an extremely rare bug where `SparkHadoopUtil.getFileSystemThreadStatistics` would fail with a `NullPointerException` if the Hadoop `FileSystem.statisticsTable` contained a `Statistics` entry without a schema. I'm not sure exactly how Hadoop gets into such a state, but this patch's regression test forces that state in order to reproduce this bug. The fix is to add additional null-checking. I debated adding an additional try-catch block around this entire metrics code to just ignore exceptions and keep going in the case of errors, but decided against that approach for now because it seemed overly conservative and might mask other bugs. We can revisit this in followup patches. Author: Josh Rosen <[email protected]> Closes apache#6618 from JoshRosen/SPARK-8062-branch-1.2 and squashes the following commits: 652fa3c [Josh Rosen] Re-name test and reapply fix 66fc600 [Josh Rosen] Fix and minimize regression test (verified that it still fails) 1d8d125 [Josh Rosen] Fix SPARK-8062 with additional null checks b6430f0 [Josh Rosen] Add failing regression test for SPARK-8062
This patch adds a regression test for an extremely rare bug where
SparkHadoopUtil.getFileSystemThreadStatistics
would fail with aNullPointerException
if the HadoopFileSystem.statisticsTable
contained aStatistics
entry without a schema. I'm not sure exactly how Hadoop gets into such a state, but this patch's regression test forces that state in order to reproduce this bug.The fix is to add additional null-checking. I debated adding an additional try-catch block around this entire metrics code to just ignore exceptions and keep going in the case of errors, but decided against that approach for now because it seemed overly conservative and might mask other bugs. We can revisit this in followup patches.