Skip to content

SPARK-1676: (branch-0.9 fix) Cache Hadoop UGIs to prevent FileSystem leak #618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

aarondav
Copy link
Contributor

@aarondav aarondav commented May 2, 2014

This is a followup patch to #607, which contains a discussion about the proper solution for this problem. This PR is pointed to branch-0.9 to provide a very low-impact fix that allows users to enable UGI caching if they run into this problem without affecting the default behavior.

aarondav added 2 commits May 1, 2014 20:59
UserGroupInformation objects (UGIs) are used for Hadoop security. A relatively
recent PR (apache#29) makes Spark always use UGIs when executing tasks. Unfortunately,
this causes HDFS-3545, which causes the FileSystem cache to continuously create
new FileSystems, as the UGIs look different (even though they're logically
identical). This causes a memory and sometimes file descriptor leak for FileSystems
(like S3N) which maintain open connections.

This solution is to introduce a config option (enabled by default) which reuses a
single Spark user UGI, rather than creating new ones for each task. The downside
to this approach is that UGIs cannot be safely cached (see the notes in HDFS-3545).
For example, if a token expires, it will never be cleared from the UGI but may
be used anyway (usage of a particular token on a UGI is nondeterministic as it is
backed by a Set).

This setting is enabled by default because the memory leak can become serious
very quickly. In one benchmark, attempting to read 10k files from an S3 directory
caused 45k connections to remain open to S3 after the job completed. These file
descriptors are never cleaned up, nor the memory used by the associated
FileSystems.

Conflicts:
	docs/configuration.md
	yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14618/

@aarondav
Copy link
Contributor Author

aarondav commented May 2, 2014

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14621/

@aarondav aarondav closed this May 6, 2014
gzm55 pushed a commit to MediaV/spark that referenced this pull request Jul 17, 2014
https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <[email protected]>

Closes apache#618 from CodingCat/doc_version and squashes the following commits:

39bb8aa [CodingCat] more fixes
65bedb0 [CodingCat] fix site scala version error in doc
(cherry picked from commit 7b012c9)

Conflicts:

	docs/_config.yml
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
https://spark-project.atlassian.net/browse/SPARK-1105

fix site scala version error

Author: CodingCat <[email protected]>

Closes apache#618 from CodingCat/doc_version and squashes the following commits:

39bb8aa [CodingCat] more fixes
65bedb0 [CodingCat] fix site scala version error in doc
(cherry picked from commit 7b012c9)

Conflicts:

	docs/_config.yml
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Upgrade to pyton3.6 for kind post job
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants