-
Notifications
You must be signed in to change notification settings - Fork 28.7k
SPARK-1676: Cache Hadoop UGIs by default to prevent FileSystem leak #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14625/ |
This seems pretty reasonable to me, but it assumes that there is no value in recreating the user and re-transferring the current user's credentials. Is this the case? |
// Create a new Executor and start it running | ||
val runner = new MesosExecutorBackend() | ||
new MesosExecutorDriver(runner).run() | ||
val sparkUser = Option(System.getenv("SPARK_USER")).getOrElse(SparkContext.SPARK_UNKNOWN_USER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a Utils function for this, something like Utils.getSparkUser
I have tested this on standalone mode and confirmed that the file handles do not leak. |
There is no reason to recreate the user and repopulate the credentials/token unless the credentials/tokens are being updated in the ExecutorBackend process. On yarn this definitely doesn't happen. Once you start an executor it keeps the same credentials/tokens, the Yarn resourcemanager handles renewing the tokens. As far as I know there isn't support for this built into spark for mesos and standalone but perhaps there is something I'm not aware of. Is there anything you know of that does that, that I might have missed? The only other case its useful to create a separate ugi is if we add support to run tasks as different users. Thanks for the comments and doing the standalone testing. I'll update. |
To add to what Tom said, there's a distinction between "renewing" tokens and "repopulating" them. Renewing means extending the lifespan of existing tokens. Repopulating with new tokens is not something that YARN currently does. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14629/ |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14630/ |
Jenkins, retest this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@sryza so this looks good to you? |
This does look good to me. |
LGTM too. Thanks for the clarifications, guys. Merging into master, branch-1.0, and branch-0.9. |
…leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to #607 Author: Thomas Graves <[email protected]> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up. (cherry picked from commit 3d0a02d) Signed-off-by: Aaron Davidson <[email protected]> Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/WorkerLauncher.scala
…leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to #607 Author: Thomas Graves <[email protected]> Closes #621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up. (cherry picked from commit 3d0a02d) Signed-off-by: Aaron Davidson <[email protected]>
…leak Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems. Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user. Note this hasn't been fully tested yet. Need to test in standalone mode. Putting this up for people to look at and possibly test. I don't have access to a mesos cluster. This is alternative to apache#607 Author: Thomas Graves <[email protected]> Closes apache#621 from tgravescs/SPARK-1676 and squashes the following commits: 244d55a [Thomas Graves] fix line length 44163d4 [Thomas Graves] Rework 9398853 [Thomas Graves] change to have doAs in executor higher up.
Move the doAs in Executor higher up so that we only have 1 ugi and aren't leaking filesystems.
Fix spark on yarn to work when the cluster is running as user "yarn" but the clients are launched as the user and want to read/write to hdfs as the user.
Note this hasn't been fully tested yet. Need to test in standalone mode.
Putting this up for people to look at and possibly test. I don't have access to a mesos cluster.
This is alternative to #607