[SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. #1218

vanzin · 2014-06-25T22:33:19Z

This change exposes the application ID generated by the Spark Master, Mesos or Yarn
via the SparkListenerApplicationStart event. It then uses that information to expose the
application via its ID in the history server, instead of using the internal directory name
generated by the event logger as an application id. This allows someone who knows
the application ID to easily figure out the URL for the application's entry in the HS, aside
from looking better.

In Yarn mode, this is used to generate a direct link from the RM application list to the
Spark history server entry (thus providing a fix for SPARK-2150).

Note this sort of assumes that the different managers will generate app ids that are
sufficiently different from each other that clashes will not occur.

vanzin · 2014-06-25T22:34:47Z

This provides an alternative to PR #1094; it touches more parts than that PR, so people might have more reservations about the approach.

I tested with local, standalone and yarn modes (don't have mesos around to try).

AmplabJenkins · 2014-06-25T22:35:20Z

Can one of the admins verify this patch?

Lay down the infrastructure to plumb a backend-generated application id back to the SparkContext, and make the application ID generated for apps running in standalone and yarn mode available.

This makes it more efficient to search for applications by id, since it's not necessarily related to the location of the app in the file system. Memory usage should be little worse than before, but by a constant factor (since it's mostly the extra overhead of a LinkedHashMap over an ArrayBuffer to maintain the data).

This allows the application ID set by the master to be included in the SparkListenerApplicationStart event. This should affect job scheduling because tasks can only be submitted after executors register, which will happen after the client registers with the master anyway. (This is similar to what the Mesos backend does to implement the same behavior.)

It messes up the internal iterator state so it's not usable after the call, which we need here. (Mental note: read all the scaladoc next time.)

andrewor14 · 2014-07-28T18:36:58Z

test this please

andrewor14 · 2014-07-28T18:37:21Z

core/src/main/scala/org/apache/spark/SparkContext.scala

-          }
-        }
-        scheduler.initialize(backend)
-        scheduler


Why is this removed?

I see you're initializing the backend in YarnClusterScheduler instead. Why are we changing this? It might make sense to fix this in a separate PR if this is a bug / some kind of improvement.

So, what happened is that when I wrote this code there was no such thing as YarnClusterScheduler. So I created it, and moved its initialization to the backend class. Later someone else added the same class and I just merged my code with theirs.

I think this is cleaner because it removes reflection code from SparkContext. Unless you think that someone will ever try to match YarnClusterSchedulerBackend with something other than YarnClusterScheduler.

I haven't looked at this in detail yet, but it seems a bit odd to me to move this particular one down into the YarnClusterScheduler but the rest define backend and initialize them here. I would think it would be easier to read having it here.

Maybe I'm missing something, but this is just removing a bunch of reflection code and replacing it with a single line later on (in YarnClusterScheduler):

initialize(new YarnClusterSchedulerBackend(this, sc))

This looks much, much easier to read and cleaner to me, but if you guys somehow feel so strongly about it, I can revert the change.

The reflection code exists since the class can't be loaded in all env .
Tom, has this changed recently?
On 05-Aug-2014 10:27 pm, "Marcelo Vanzin" [email protected] wrote:

In core/src/main/scala/org/apache/spark/SparkContext.scala:

@@ -1531,18 +1532,6 @@ object SparkContext extends Logging {
throw new SparkException("YARN mode not available ?", e)
}
}

val backend = try {

val clazz =

Class.forName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")

val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])

cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]

} catch {

case e: Exception => {

throw new SparkException("YARN mode not available ?", e)

}

}

scheduler.initialize(backend)

scheduler

Maybe I'm missing something, but this is just removing a bunch of
reflection code and replacing it with a single line later on (in
YarnClusterScheduler):

initialize(new YarnClusterSchedulerBackend(this, sc))

This looks much, much easier to read and cleaner to me, but if you guys
somehow feel so strongly about it, I can revert the change.

—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/1218/files#r15825749.

@mridulm yes I'm aware of that. But before, SparkContext was using reflection to instantiate two different classes in the yarn package, and then connect them manually. I removed one of those (see that there's still reflection code to load YarnClusterScheduler) because it seemed unnecessary.

@tgravescs sorry, I see what you mean now. Still, I see this as an improvement; the scheduler variables in all the cases are not used anywhere - in fact, even the initialize calls are duplicated in all the different cases.

So unless there is a real desire to be able to match backends and schedulers at will, I think encapsulating the backend initialization like I did is a better pattern.

SparkQA · 2014-07-28T18:38:55Z

QA tests have started for PR 1218. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17294/consoleFull

andrewor14 · 2014-07-28T18:39:33Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala

@@ -300,4 +303,7 @@ private[spark] class CoarseMesosSchedulerBackend(
    logInfo("Executor lost: %s, marking slave %s as lost".format(e.getValue, s.getValue))
    slaveLost(d, s)
  }
+
+  override def applicationId(): Option[String] =
+    Some(frameworkId).map(id => Some(id.getValue())).getOrElse(null)


minor: orNull, here and other places

Actually, can't you just do Option(frameworkId).map(_.getValue) here? I believe the existing code is incorrect because Some(null) is not the same as None

andrewor14 · 2014-07-28T19:21:10Z

@vanzin Thanks for your PR, I left some comments inline. The main points are the following: I'm not sure what it means for an application to have an option of application ID. From the perspective of an application it should always have an ID. If this is an implementation detail (i.e. because certain cluster managers don't provide their own IDs), then it might make sense to use a default (e.g. mesos-<app-name>-<timestamp>).

From your description I take it that you haven't had the chance to test this on Mesos. Given that I think it makes sense to hold back on adding this behavior there for now and instead file a JIRA for it. Also, now that #1094 is merged, I assume part of your changes in the YARN code needs to be reverted.

andrewor14 · 2014-09-02T16:45:48Z

core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala

+
+        replayBus.replay()
+
+        // Note that this does not have any effect due to SPARK-2169.


Now that #1252 is merged, is this still true?

SparkQA · 2014-09-02T17:24:47Z

QA tests have finished for PR 1218 at commit 56fe42e.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,
- abstract class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: ActorSystem)

andrewor14 · 2014-09-02T17:26:32Z

core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala

+        registrationLock.wait()
+      }
+    }
+  }


Could this lead to a deadlock? While we wait on the registrationLock we're still inside the synchronized block, so the notify thread may not even get into the synchronized block in wakeUpContext. Right?

wait() releases the lock while the wait happens.

Ah I see, it's synchronized on the same thing.

andrewor14 · 2014-09-02T22:21:51Z

retest this please

SparkQA · 2014-09-02T22:24:05Z

QA tests have started for PR 1218 at commit 2d19f3c.

This patch merges cleanly.

SparkQA · 2014-09-02T23:25:25Z

QA tests have finished for PR 1218 at commit 2d19f3c.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,

andrewor14 · 2014-09-03T19:53:20Z

Hey @vanzin is there a JIRA for this?

vanzin · 2014-09-03T19:57:05Z

I did this work in the context of SPARK-2150 (cited in the commit message), but there's no jira specifically for using this solution.

andrewor14 · 2014-09-03T20:01:08Z

Since this is a non-trivial change and covers a somewhat wide scope, can you make a JIRA specifically for this PR? Then maybe we can link it to SPARK-2150. This will make it easier for us to keep track of what issues the changes correspond to.

andrewor14 · 2014-09-03T21:58:26Z

Thanks, I merged this (and resolved a minor conflict) into master.

…n history server. This change exposes the application ID generated by the Spark Master, Mesos or Yarn via the SparkListenerApplicationStart event. It then uses that information to expose the application via its ID in the history server, instead of using the internal directory name generated by the event logger as an application id. This allows someone who knows the application ID to easily figure out the URL for the application's entry in the HS, aside from looking better. In Yarn mode, this is used to generate a direct link from the RM application list to the Spark history server entry (thus providing a fix for SPARK-2150). Note this sort of assumes that the different managers will generate app ids that are sufficiently different from each other that clashes will not occur. Author: Marcelo Vanzin <[email protected]> This patch had conflicts when merged, resolved by Committer: Andrew Or <[email protected]> Closes apache#1218 from vanzin/yarn-hs-link-2 and squashes the following commits: 2d19f3c [Marcelo Vanzin] Review feedback. 6706d3a [Marcelo Vanzin] Implement applicationId() in base classes. 56fe42e [Marcelo Vanzin] Fix cluster mode history address, plus a cleanup. 44112a8 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 8278316 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 a86bbcf [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 a0056e6 [Marcelo Vanzin] Unbreak test. 4b10cfd [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 cb0cab2 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 25f2826 [Marcelo Vanzin] Add MIMA excludes. f0ba90f [Marcelo Vanzin] Use BufferedIterator. c90a08d [Marcelo Vanzin] Remove unused code. 3f8ec66 [Marcelo Vanzin] Review feedback. 21aa71b [Marcelo Vanzin] Fix JSON test. b022bae [Marcelo Vanzin] Undo SparkContext cleanup. c6d7478 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 4e3483f [Marcelo Vanzin] Fix test. 57517b8 [Marcelo Vanzin] Review feedback. Mostly, more consistent use of Scala's Option. 311e49d [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 d35d86f [Marcelo Vanzin] Fix yarn backend after rebase. 36dc362 [Marcelo Vanzin] Don't use Iterator::takeWhile(). 0afd696 [Marcelo Vanzin] Wait until master responds before returning from start(). abc4697 [Marcelo Vanzin] Make FsHistoryProvider keep a map of applications by id. 26b266e [Marcelo Vanzin] Use Mesos framework ID as Spark application ID. b3f3664 [Marcelo Vanzin] [yarn] Make the RM link point to the app direcly in the HS. 2fb7de4 [Marcelo Vanzin] Expose the application ID in the ApplicationStart event. ed10348 [Marcelo Vanzin] Expose application id to spark context.

…PKD (apache#1218)

This was referenced Jun 25, 2014

[SPARK-2261] Make event logger use a single file. #1222

Closed

[SPARK-2169] Don't copy appName / basePath everywhere. #1252

Closed

Marcelo Vanzin added 8 commits July 16, 2014 09:54

Expose application id to spark context.

ed10348

Lay down the infrastructure to plumb a backend-generated application id back to the SparkContext, and make the application ID generated for apps running in standalone and yarn mode available.

Expose the application ID in the ApplicationStart event.

2fb7de4

[yarn] Make the RM link point to the app direcly in the HS.

b3f3664

Use Mesos framework ID as Spark application ID.

26b266e

Don't use Iterator::takeWhile().

36dc362

It messes up the internal iterator state so it's not usable after the call, which we need here. (Mental note: read all the scaladoc next time.)

Fix yarn backend after rebase.

d35d86f

andrewor14 reviewed Jul 28, 2014
View reviewed changes

andrewor14 reviewed Sep 2, 2014
View reviewed changes

Marcelo Vanzin added 2 commits September 2, 2014 10:50

Implement applicationId() in base classes.

6706d3a

Review feedback.

2d19f3c

vanzin changed the title ~~Expose aplication ID in ApplicationStart event, use it in history server.~~ [SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. Sep 3, 2014

asfgit closed this in f2b5b61 Sep 3, 2014

vanzin deleted the yarn-hs-link-2 branch September 3, 2014 23:01

JoshRosen mentioned this pull request Sep 16, 2014

[SPARK-3377] [Metrics] Metrics can be accidentally aggregated #2250

Closed

andrewor14 mentioned this pull request Oct 1, 2014

[SPARK-3377] [SPARK-3610] Metrics can be accidentally aggregated / History server log name should not be based on user input #2432

Closed

mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025

EZAF-6347: Avoid using custom spark-env.sh and sprk-defaults.conf in …

d3e36e0

…PKD (apache#1218)


		replayBus.replay()

		// Note that this does not have any effect due to SPARK-2169.

[SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. #1218

[SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. #1218

Uh oh!

Conversation

vanzin commented Jun 25, 2014

Uh oh!

vanzin commented Jun 25, 2014

Uh oh!

AmplabJenkins commented Jun 25, 2014

Uh oh!

andrewor14 commented Jul 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Jul 28, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 2, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Sep 2, 2014

Uh oh!

SparkQA commented Sep 2, 2014

Uh oh!

SparkQA commented Sep 2, 2014

Uh oh!

andrewor14 commented Sep 3, 2014

Uh oh!

vanzin commented Sep 3, 2014

Uh oh!

andrewor14 commented Sep 3, 2014

Uh oh!

andrewor14 commented Sep 3, 2014

Uh oh!

Uh oh!