-
Notifications
You must be signed in to change notification settings - Fork 28.7k
SPARK-1676: (branch-0.9 fix) Cache Hadoop UGIs to prevent FileSystem leak #618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
UserGroupInformation objects (UGIs) are used for Hadoop security. A relatively recent PR (apache#29) makes Spark always use UGIs when executing tasks. Unfortunately, this causes HDFS-3545, which causes the FileSystem cache to continuously create new FileSystems, as the UGIs look different (even though they're logically identical). This causes a memory and sometimes file descriptor leak for FileSystems (like S3N) which maintain open connections. This solution is to introduce a config option (enabled by default) which reuses a single Spark user UGI, rather than creating new ones for each task. The downside to this approach is that UGIs cannot be safely cached (see the notes in HDFS-3545). For example, if a token expires, it will never be cleared from the UGI but may be used anyway (usage of a particular token on a UGI is nondeterministic as it is backed by a Set). This setting is enabled by default because the memory leak can become serious very quickly. In one benchmark, attempting to read 10k files from an S3 directory caused 45k connections to remain open to S3 after the job completed. These file descriptors are never cleaned up, nor the memory used by the associated FileSystems. Conflicts: docs/configuration.md yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14618/ |
Jenkins, retest this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
https://spark-project.atlassian.net/browse/SPARK-1105 fix site scala version error Author: CodingCat <[email protected]> Closes apache#618 from CodingCat/doc_version and squashes the following commits: 39bb8aa [CodingCat] more fixes 65bedb0 [CodingCat] fix site scala version error in doc (cherry picked from commit 7b012c9) Conflicts: docs/_config.yml
https://spark-project.atlassian.net/browse/SPARK-1105 fix site scala version error Author: CodingCat <[email protected]> Closes apache#618 from CodingCat/doc_version and squashes the following commits: 39bb8aa [CodingCat] more fixes 65bedb0 [CodingCat] fix site scala version error in doc (cherry picked from commit 7b012c9) Conflicts: docs/_config.yml
Upgrade to pyton3.6 for kind post job
This is a followup patch to #607, which contains a discussion about the proper solution for this problem. This PR is pointed to branch-0.9 to provide a very low-impact fix that allows users to enable UGI caching if they run into this problem without affecting the default behavior.