SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak #621

tgravescs · 2014-05-02T20:49:04Z

Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.

Note this hasn't been fully tested yet. Need to test in standalone mode.

Putting this up for people to look at and possibly test. I don't have access to a mesos cluster.

This is alternative to #607

AmplabJenkins · 2014-05-02T20:52:57Z

Merged build triggered.

AmplabJenkins · 2014-05-02T20:53:03Z

Merged build started.

AmplabJenkins · 2014-05-02T20:54:26Z

Merged build finished.

AmplabJenkins · 2014-05-02T20:54:26Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14625/

aarondav · 2014-05-02T22:38:21Z

This seems pretty reasonable to me, but it assumes that there is no value in recreating the user and re-transferring the current user's credentials. Is this the case?

andrewor14 · 2014-05-02T22:41:29Z

core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala

-    // Create a new Executor and start it running
-    val runner = new MesosExecutorBackend()
-    new MesosExecutorDriver(runner).run()
+    val sparkUser = Option(System.getenv("SPARK_USER")).getOrElse(SparkContext.SPARK_UNKNOWN_USER)


We should probably add a Utils function for this, something like Utils.getSparkUser

aarondav · 2014-05-03T00:28:01Z

I have tested this on standalone mode and confirmed that the file handles do not leak.

tgravescs · 2014-05-03T01:17:59Z

There is no reason to recreate the user and repopulate the credentials/token unless the credentials/tokens are being updated in the ExecutorBackend process. On yarn this definitely doesn't happen. Once you start an executor it keeps the same credentials/tokens, the Yarn resourcemanager handles renewing the tokens. As far as I know there isn't support for this built into spark for mesos and standalone but perhaps there is something I'm not aware of. Is there anything you know of that does that, that I might have missed? The only other case its useful to create a separate ugi is if we add support to run tasks as different users.

Thanks for the comments and doing the standalone testing. I'll update.

sryza · 2014-05-03T01:24:30Z

To add to what Tom said, there's a distinction between "renewing" tokens and "repopulating" them. Renewing means extending the lifespan of existing tokens. Repopulating with new tokens is not something that YARN currently does.

AmplabJenkins · 2014-05-03T02:32:57Z

Merged build triggered.

AmplabJenkins · 2014-05-03T02:33:05Z

Merged build started.

AmplabJenkins · 2014-05-03T02:34:29Z

Merged build finished.

AmplabJenkins · 2014-05-03T02:34:30Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14629/

AmplabJenkins · 2014-05-03T02:42:57Z

Merged build triggered.

AmplabJenkins · 2014-05-03T02:43:05Z

Merged build started.

AmplabJenkins · 2014-05-03T03:20:08Z

Merged build finished.

AmplabJenkins · 2014-05-03T03:20:09Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14630/

pwendell · 2014-05-03T04:16:30Z

Jenkins, retest this please.

AmplabJenkins · 2014-05-03T04:17:57Z

Merged build triggered.

AmplabJenkins · 2014-05-03T04:18:05Z

Merged build started.

AmplabJenkins · 2014-05-03T04:57:00Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-03T04:57:00Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14631/

pwendell · 2014-05-03T06:16:07Z

@sryza so this looks good to you?

sryza · 2014-05-03T17:25:31Z

This does look good to me.

aarondav · 2014-05-03T18:01:29Z

LGTM too. Thanks for the clarifications, guys. Merging into master, branch-1.0, and branch-0.9.

…leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to #607 Author: Thomas Graves <[email protected]> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up. (cherry picked from commit 3d0a02d) Signed-off-by: Aaron Davidson <[email protected]> Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala

…leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to #607 Author: Thomas Graves <[email protected]> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up. (cherry picked from commit 3d0a02d) Signed-off-by: Aaron Davidson <[email protected]>

…leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to apache#607 Author: Thomas Graves <[email protected]> Closes apache#621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up.

change to have doAs in executor higher up.

9398853

andrewor14 reviewed May 2, 2014
View reviewed changes

Rework

44163d4

fix line length

244d55a

asfgit closed this in 3d0a02d May 3, 2014

tgravescs changed the title ~~[WIP] SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak~~ SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak May 5, 2014

tgravescs mentioned this pull request Aug 29, 2014

SPARK-3223 runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode #2126

Closed

SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak #621

SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak #621

Uh oh!

Conversation

tgravescs commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

AmplabJenkins commented May 2, 2014

Uh oh!

aarondav commented May 2, 2014

Uh oh!

andrewor14 May 2, 2014

Choose a reason for hiding this comment

Uh oh!

aarondav commented May 3, 2014

Uh oh!

tgravescs commented May 3, 2014

Uh oh!

sryza commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

pwendell commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

AmplabJenkins commented May 3, 2014

Uh oh!

pwendell commented May 3, 2014

Uh oh!

sryza commented May 3, 2014

Uh oh!

aarondav commented May 3, 2014

Uh oh!

Uh oh!