-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. #1218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This provides an alternative to PR #1094; it touches more parts than that PR, so people might have more reservations about the approach. I tested with local, standalone and yarn modes (don't have mesos around to try). |
Can one of the admins verify this patch? |
Lay down the infrastructure to plumb a backend-generated application id back to the SparkContext, and make the application ID generated for apps running in standalone and yarn mode available.
This makes it more efficient to search for applications by id, since it's not necessarily related to the location of the app in the file system. Memory usage should be little worse than before, but by a constant factor (since it's mostly the extra overhead of a LinkedHashMap over an ArrayBuffer to maintain the data).
This allows the application ID set by the master to be included in the SparkListenerApplicationStart event. This should affect job scheduling because tasks can only be submitted after executors register, which will happen after the client registers with the master anyway. (This is similar to what the Mesos backend does to implement the same behavior.)
It messes up the internal iterator state so it's not usable after the call, which we need here. (Mental note: read all the scaladoc next time.)
test this please |
} | ||
} | ||
scheduler.initialize(backend) | ||
scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you're initializing the backend in YarnClusterScheduler
instead. Why are we changing this? It might make sense to fix this in a separate PR if this is a bug / some kind of improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what happened is that when I wrote this code there was no such thing as YarnClusterScheduler. So I created it, and moved its initialization to the backend class. Later someone else added the same class and I just merged my code with theirs.
I think this is cleaner because it removes reflection code from SparkContext. Unless you think that someone will ever try to match YarnClusterSchedulerBackend with something other than YarnClusterScheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at this in detail yet, but it seems a bit odd to me to move this particular one down into the YarnClusterScheduler but the rest define backend and initialize them here. I would think it would be easier to read having it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but this is just removing a bunch of reflection code and replacing it with a single line later on (in YarnClusterScheduler):
initialize(new YarnClusterSchedulerBackend(this, sc))
This looks much, much easier to read and cleaner to me, but if you guys somehow feel so strongly about it, I can revert the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reflection code exists since the class can't be loaded in all env .
Tom, has this changed recently?
On 05-Aug-2014 10:27 pm, "Marcelo Vanzin" [email protected] wrote:
In core/src/main/scala/org/apache/spark/SparkContext.scala:
@@ -1531,18 +1532,6 @@ object SparkContext extends Logging {
throw new SparkException("YARN mode not available ?", e)
}
}
val backend = try {
val clazz =
Class.forName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")
val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
} catch {
case e: Exception => {
throw new SparkException("YARN mode not available ?", e)
}
}
scheduler.initialize(backend)
scheduler
Maybe I'm missing something, but this is just removing a bunch of
reflection code and replacing it with a single line later on (in
YarnClusterScheduler):initialize(new YarnClusterSchedulerBackend(this, sc))
This looks much, much easier to read and cleaner to me, but if you guys
somehow feel so strongly about it, I can revert the change.—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/1218/files#r15825749.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mridulm yes I'm aware of that. But before, SparkContext was using reflection to instantiate two different classes in the yarn package, and then connect them manually. I removed one of those (see that there's still reflection code to load YarnClusterScheduler
) because it seemed unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgravescs sorry, I see what you mean now. Still, I see this as an improvement; the scheduler
variables in all the cases are not used anywhere - in fact, even the initialize
calls are duplicated in all the different cases.
So unless there is a real desire to be able to match backends and schedulers at will, I think encapsulating the backend initialization like I did is a better pattern.
QA tests have started for PR 1218. This patch DID NOT merge cleanly! |
@@ -300,4 +303,7 @@ private[spark] class CoarseMesosSchedulerBackend( | |||
logInfo("Executor lost: %s, marking slave %s as lost".format(e.getValue, s.getValue)) | |||
slaveLost(d, s) | |||
} | |||
|
|||
override def applicationId(): Option[String] = | |||
Some(frameworkId).map(id => Some(id.getValue())).getOrElse(null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: orNull
, here and other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, can't you just do Option(frameworkId).map(_.getValue)
here? I believe the existing code is incorrect because Some(null)
is not the same as None
@vanzin Thanks for your PR, I left some comments inline. The main points are the following: I'm not sure what it means for an application to have an option of application ID. From the perspective of an application it should always have an ID. If this is an implementation detail (i.e. because certain cluster managers don't provide their own IDs), then it might make sense to use a default (e.g. From your description I take it that you haven't had the chance to test this on Mesos. Given that I think it makes sense to hold back on adding this behavior there for now and instead file a JIRA for it. Also, now that #1094 is merged, I assume part of your changes in the YARN code needs to be reverted. |
|
||
replayBus.replay() | ||
|
||
// Note that this does not have any effect due to SPARK-2169. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that #1252 is merged, is this still true?
QA tests have finished for PR 1218 at commit
|
registrationLock.wait() | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this lead to a deadlock? While we wait
on the registrationLock
we're still inside the synchronized block, so the notify thread may not even get into the synchronized block in wakeUpContext
. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait()
releases the lock while the wait happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, it's synchronized on the same thing.
retest this please |
QA tests have started for PR 1218 at commit
|
QA tests have finished for PR 1218 at commit
|
Hey @vanzin is there a JIRA for this? |
I did this work in the context of SPARK-2150 (cited in the commit message), but there's no jira specifically for using this solution. |
Since this is a non-trivial change and covers a somewhat wide scope, can you make a JIRA specifically for this PR? Then maybe we can link it to SPARK-2150. This will make it easier for us to keep track of what issues the changes correspond to. |
Thanks, I merged this (and resolved a minor conflict) into master. |
…n history server. This change exposes the application ID generated by the Spark Master, Mesos or Yarn via the SparkListenerApplicationStart event. It then uses that information to expose the application via its ID in the history server, instead of using the internal directory name generated by the event logger as an application id. This allows someone who knows the application ID to easily figure out the URL for the application's entry in the HS, aside from looking better. In Yarn mode, this is used to generate a direct link from the RM application list to the Spark history server entry (thus providing a fix for SPARK-2150). Note this sort of assumes that the different managers will generate app ids that are sufficiently different from each other that clashes will not occur. Author: Marcelo Vanzin <[email protected]> This patch had conflicts when merged, resolved by Committer: Andrew Or <[email protected]> Closes apache#1218 from vanzin/yarn-hs-link-2 and squashes the following commits: 2d19f3c [Marcelo Vanzin] Review feedback. 6706d3a [Marcelo Vanzin] Implement applicationId() in base classes. 56fe42e [Marcelo Vanzin] Fix cluster mode history address, plus a cleanup. 44112a8 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 8278316 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 a86bbcf [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 a0056e6 [Marcelo Vanzin] Unbreak test. 4b10cfd [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 cb0cab2 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 25f2826 [Marcelo Vanzin] Add MIMA excludes. f0ba90f [Marcelo Vanzin] Use BufferedIterator. c90a08d [Marcelo Vanzin] Remove unused code. 3f8ec66 [Marcelo Vanzin] Review feedback. 21aa71b [Marcelo Vanzin] Fix JSON test. b022bae [Marcelo Vanzin] Undo SparkContext cleanup. c6d7478 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 4e3483f [Marcelo Vanzin] Fix test. 57517b8 [Marcelo Vanzin] Review feedback. Mostly, more consistent use of Scala's Option. 311e49d [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2 d35d86f [Marcelo Vanzin] Fix yarn backend after rebase. 36dc362 [Marcelo Vanzin] Don't use Iterator::takeWhile(). 0afd696 [Marcelo Vanzin] Wait until master responds before returning from start(). abc4697 [Marcelo Vanzin] Make FsHistoryProvider keep a map of applications by id. 26b266e [Marcelo Vanzin] Use Mesos framework ID as Spark application ID. b3f3664 [Marcelo Vanzin] [yarn] Make the RM link point to the app direcly in the HS. 2fb7de4 [Marcelo Vanzin] Expose the application ID in the ApplicationStart event. ed10348 [Marcelo Vanzin] Expose application id to spark context.
This change exposes the application ID generated by the Spark Master, Mesos or Yarn
via the SparkListenerApplicationStart event. It then uses that information to expose the
application via its ID in the history server, instead of using the internal directory name
generated by the event logger as an application id. This allows someone who knows
the application ID to easily figure out the URL for the application's entry in the HS, aside
from looking better.
In Yarn mode, this is used to generate a direct link from the RM application list to the
Spark history server entry (thus providing a fix for SPARK-2150).
Note this sort of assumes that the different managers will generate app ids that are
sufficiently different from each other that clashes will not occur.