Skip to content

SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

tgravescs
Copy link
Contributor

Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.

Note this hasn't been fully tested yet. Need to test in standalone mode.

Putting this up for people to look at and possibly test. I don't have access to a mesos cluster.

This is alternative to #607

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14625/

@aarondav
Copy link
Contributor

aarondav commented May 2, 2014

This seems pretty reasonable to me, but it assumes that there is no value in recreating the user and re-transferring the current user's credentials. Is this the case?

// Create a new Executor and start it running
val runner = new MesosExecutorBackend()
new MesosExecutorDriver(runner).run()
val sparkUser = Option(System.getenv("SPARK_USER")).getOrElse(SparkContext.SPARK_UNKNOWN_USER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a Utils function for this, something like Utils.getSparkUser

@aarondav
Copy link
Contributor

aarondav commented May 3, 2014

I have tested this on standalone mode and confirmed that the file handles do not leak.

@tgravescs
Copy link
Contributor Author

There is no reason to recreate the user and repopulate the credentials/token unless the credentials/tokens are being updated in the ExecutorBackend process. On yarn this definitely doesn't happen. Once you start an executor it keeps the same credentials/tokens, the Yarn resourcemanager handles renewing the tokens. As far as I know there isn't support for this built into spark for mesos and standalone but perhaps there is something I'm not aware of. Is there anything you know of that does that, that I might have missed? The only other case its useful to create a separate ugi is if we add support to run tasks as different users.

Thanks for the comments and doing the standalone testing. I'll update.

@sryza
Copy link
Contributor

sryza commented May 3, 2014

To add to what Tom said, there's a distinction between "renewing" tokens and "repopulating" them. Renewing means extending the lifespan of existing tokens. Repopulating with new tokens is not something that YARN currently does.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14629/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14630/

@pwendell
Copy link
Contributor

pwendell commented May 3, 2014

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14631/

@pwendell
Copy link
Contributor

pwendell commented May 3, 2014

@sryza so this looks good to you?

@sryza
Copy link
Contributor

sryza commented May 3, 2014

This does look good to me.

@aarondav
Copy link
Contributor

aarondav commented May 3, 2014

LGTM too. Thanks for the clarifications, guys. Merging into master, branch-1.0, and branch-0.9.

asfgit pushed a commit that referenced this pull request May 3, 2014
…leak

Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.

Note this hasn't been fully tested yet.  Need to test in standalone mode.

Putting this up for people to look at and possibly test.  I don't have access to a mesos cluster.

This is alternative to #607

Author: Thomas Graves <[email protected]>

Closes #621 from tgravescs/SPARK-1676 and squashes the following commits:

244d55a [Thomas Graves] fix line length
44163d4 [Thomas Graves] Rework
9398853 [Thomas Graves] change to have doAs in executor higher up.

(cherry picked from commit 3d0a02d)
Signed-off-by: Aaron Davidson <[email protected]>

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
	core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
	yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala
	yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
	yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala
@asfgit asfgit closed this in 3d0a02d May 3, 2014
asfgit pushed a commit that referenced this pull request May 3, 2014
…leak

Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.

Note this hasn't been fully tested yet.  Need to test in standalone mode.

Putting this up for people to look at and possibly test.  I don't have access to a mesos cluster.

This is alternative to #607

Author: Thomas Graves <[email protected]>

Closes #621 from tgravescs/SPARK-1676 and squashes the following commits:

244d55a [Thomas Graves] fix line length
44163d4 [Thomas Graves] Rework
9398853 [Thomas Graves] change to have doAs in executor higher up.

(cherry picked from commit 3d0a02d)
Signed-off-by: Aaron Davidson <[email protected]>
@tgravescs tgravescs changed the title [WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak May 5, 2014
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…leak

Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.

Note this hasn't been fully tested yet.  Need to test in standalone mode.

Putting this up for people to look at and possibly test.  I don't have access to a mesos cluster.

This is alternative to apache#607

Author: Thomas Graves <[email protected]>

Closes apache#621 from tgravescs/SPARK-1676 and squashes the following commits:

244d55a [Thomas Graves] fix line length
44163d4 [Thomas Graves] Rework
9398853 [Thomas Graves] change to have doAs in executor higher up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants