Skip to content

[SPARK-8062] Fix NullPointerException in SparkHadoopUtil.getFileSystemThreadStatistics (branch-1.2) #6618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

JoshRosen
Copy link
Contributor

This patch adds a regression test for an extremely rare bug where SparkHadoopUtil.getFileSystemThreadStatistics would fail with a NullPointerException if the Hadoop FileSystem.statisticsTable contained a Statistics entry without a schema. I'm not sure exactly how Hadoop gets into such a state, but this patch's regression test forces that state in order to reproduce this bug.

The fix is to add additional null-checking. I debated adding an additional try-catch block around this entire metrics code to just ignore exceptions and keep going in the case of errors, but decided against that approach for now because it seemed overly conservative and might mask other bugs. We can revisit this in followup patches.

@JoshRosen
Copy link
Contributor Author

See https://issues.apache.org/jira/browse/SPARK-8062 for the original stacktrace and investigation.

This PR is opened against branch-1.2 because I'm trying to deliver a hotfix to a user running 1.2.1; I'm going to open another patch to fix this in 1.3.x, 1.4.x, and master.

While investigating this, I may have discovered other bugs in our input output metrics code, which I'll follow up on in subsequent patches; see https://issues.apache.org/jira/browse/SPARK-8086 for details.

/cc @sryza

@JoshRosen
Copy link
Contributor Author

On closer investigation, this bug doesn't seem to directly affect Spark 1.3+.

Also, SPARK-8086 appears to be a false alarm due to some issues in test code. I'd still like to follow up with some documentation and cleanup in this code, but it's a much lower priority since I no longer think that there's a bug here.

@JoshRosen
Copy link
Contributor Author

Alright, updated the patch to have a smaller regression test and to remove the incorrect SPARK-8086 discussion.

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34112 has finished for PR 6618 at commit 1d8d125.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Seq.empty
} else {
FileSystem.getAllStatistics
.filter { stats => scheme.equals(stats.getScheme()) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: be consistent with parentheses for methods with empty args (getAllStatistics missing them, getScheme and toUri have them)
Tiny nit: do you think (scheme.equals(_.getScheme())) would be less clear?

@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34114 has finished for PR 6618 at commit 652fa3c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sryza
Copy link
Contributor

sryza commented Jun 3, 2015

The test failures are unrelated as far as I can tell. LGTM.

@JoshRosen
Copy link
Contributor Author

Yeah, branch-1.2's Jenkins tests are in a bit of a flaky state right now. I'm going to let the style nit slide and will merge this now since I've received confirmation from the user that it fixes their issue.

asfgit pushed a commit that referenced this pull request Jun 8, 2015
…mThreadStatistics (branch-1.2)

This patch adds a regression test for an extremely rare bug where `SparkHadoopUtil.getFileSystemThreadStatistics` would fail with a `NullPointerException` if the Hadoop `FileSystem.statisticsTable` contained a `Statistics` entry without a schema.  I'm not sure exactly how Hadoop gets into such a state, but this patch's regression test forces that state in order to reproduce this bug.

The fix is to add additional null-checking.  I debated adding an additional try-catch block around this entire metrics code to just ignore exceptions and keep going in the case of errors, but decided against that approach for now because it seemed overly conservative and might mask other bugs. We can revisit this in followup patches.

Author: Josh Rosen <[email protected]>

Closes #6618 from JoshRosen/SPARK-8062-branch-1.2 and squashes the following commits:

652fa3c [Josh Rosen] Re-name test and reapply fix
66fc600 [Josh Rosen] Fix and minimize regression test (verified that it still fails)
1d8d125 [Josh Rosen] Fix SPARK-8062 with additional null checks
b6430f0 [Josh Rosen] Add failing regression test for SPARK-8062
@JoshRosen JoshRosen closed this Jun 8, 2015
@JoshRosen JoshRosen deleted the SPARK-8062-branch-1.2 branch June 8, 2015 17:59
jfota pushed a commit to mapr/spark that referenced this pull request Jun 10, 2015
…tics (branch-1.2)

This patch adds a regression test for an extremely rare bug where `SparkHadoopUtil.getFileSystemThreadStatistics` would fail with a `NullPointerException` if the Hadoop `FileSystem.statisticsTable` contained a `Statistics` entry without a schema.  I'm not sure exactly how Hadoop gets into such a state, but this patch's regression test forces that state in order to reproduce this bug.

The fix is to add additional null-checking.  I debated adding an additional try-catch block around this entire metrics code to just ignore exceptions and keep going in the case of errors, but decided against that approach for now because it seemed overly conservative and might mask other bugs. We can revisit this in followup patches.

Author: Josh Rosen <[email protected]>

Closes apache#6618 from JoshRosen/SPARK-8062-branch-1.2 and squashes the following commits:

652fa3c [Josh Rosen] Re-name test and reapply fix
66fc600 [Josh Rosen] Fix and minimize regression test (verified that it still fails)
1d8d125 [Josh Rosen] Fix SPARK-8062 with additional null checks
b6430f0 [Josh Rosen] Add failing regression test for SPARK-8062
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants