SPARK-1676: (branch-0.9 fix) Cache Hadoop UGIs to prevent FileSystem leak #618

aarondav · 2014-05-02T05:04:22Z

This is a followup patch to #607, which contains a discussion about the proper solution for this problem. This PR is pointed to branch-0.9 to provide a very low-impact fix that allows users to enable UGI caching if they run into this problem without affecting the default behavior.

UserGroupInformation objects (UGIs) are used for Hadoop security. A relatively recent PR (apache#29) makes Spark always use UGIs when executing tasks. Unfortunately, this causes HDFS-3545, which causes the FileSystem cache to continuously create new FileSystems, as the UGIs look different (even though they're logically identical). This causes a memory and sometimes file descriptor leak for FileSystems (like S3N) which maintain open connections. This solution is to introduce a config option (enabled by default) which reuses a single Spark user UGI, rather than creating new ones for each task. The downside to this approach is that UGIs cannot be safely cached (see the notes in HDFS-3545). For example, if a token expires, it will never be cleared from the UGI but may be used anyway (usage of a particular token on a UGI is nondeterministic as it is backed by a Set). This setting is enabled by default because the memory leak can become serious very quickly. In one benchmark, attempting to read 10k files from an S3 directory caused 45k connections to remain open to S3 after the job completed. These file descriptors are never cleaned up, nor the memory used by the associated FileSystems. Conflicts: docs/configuration.md yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala

AmplabJenkins · 2014-05-02T05:07:57Z

Merged build triggered.

AmplabJenkins · 2014-05-02T05:08:03Z

Merged build started.

AmplabJenkins · 2014-05-02T06:04:05Z

Merged build finished.

AmplabJenkins · 2014-05-02T06:04:05Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14618/

aarondav · 2014-05-02T06:13:07Z

Jenkins, retest this please.

AmplabJenkins · 2014-05-02T06:17:57Z

Merged build triggered.

AmplabJenkins · 2014-05-02T06:18:04Z

Merged build started.

AmplabJenkins · 2014-05-02T07:24:30Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-02T07:24:30Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14621/

https://spark-project.atlassian.net/browse/SPARK-1105 fix site scala version error Author: CodingCat <[email protected]> Closes apache#618 from CodingCat/doc_version and squashes the following commits: 39bb8aa [CodingCat] more fixes 65bedb0 [CodingCat] fix site scala version error in doc (cherry picked from commit 7b012c9) Conflicts: docs/_config.yml

Upgrade to pyton3.6 for kind post job

aarondav added 2 commits May 1, 2014 20:59

Set to false by default and point towards branch-0.9

c8f29bc

aarondav closed this May 6, 2014

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Upgrade to pyton3.6 for kind post job (apache#618)

f179e0c

Upgrade to pyton3.6 for kind post job

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SPARK-1676: (branch-0.9 fix) Cache Hadoop UGIs to prevent FileSystem leak #618

SPARK-1676: (branch-0.9 fix) Cache Hadoop UGIs to prevent FileSystem leak #618

Uh oh!

aarondav commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

aarondav commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

Uh oh!

SPARK-1676: (branch-0.9 fix) Cache Hadoop UGIs to prevent FileSystem leak #618

SPARK-1676: (branch-0.9 fix) Cache Hadoop UGIs to prevent FileSystem leak #618

Uh oh!

Conversation

aarondav commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

aarondav commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

Uh oh!