Skip to content

[SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. #1218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 27 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Jun 25, 2014

This change exposes the application ID generated by the Spark Master, Mesos or Yarn
via the SparkListenerApplicationStart event. It then uses that information to expose the
application via its ID in the history server, instead of using the internal directory name
generated by the event logger as an application id. This allows someone who knows
the application ID to easily figure out the URL for the application's entry in the HS, aside
from looking better.

In Yarn mode, this is used to generate a direct link from the RM application list to the
Spark history server entry (thus providing a fix for SPARK-2150).

Note this sort of assumes that the different managers will generate app ids that are
sufficiently different from each other that clashes will not occur.

@vanzin
Copy link
Contributor Author

vanzin commented Jun 25, 2014

This provides an alternative to PR #1094; it touches more parts than that PR, so people might have more reservations about the approach.

I tested with local, standalone and yarn modes (don't have mesos around to try).

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Marcelo Vanzin added 8 commits July 16, 2014 09:54
Lay down the infrastructure to plumb a backend-generated application id
back to the SparkContext, and make the application ID generated for apps
running in standalone and yarn mode available.
This makes it more efficient to search for applications by
id, since it's not necessarily related to the location of
the app in the file system.

Memory usage should be little worse than before, but by a
constant factor (since it's mostly the extra overhead of
a LinkedHashMap over an ArrayBuffer to maintain the data).
This allows the application ID set by the master to be included
in the SparkListenerApplicationStart event. This should affect
job scheduling because tasks can only be submitted after executors
register, which will happen after the client registers with the
master anyway.

(This is similar to what the Mesos backend does to implement
the same behavior.)
It messes up the internal iterator state so it's not usable after
the call, which we need here. (Mental note: read all the scaladoc
next time.)
@andrewor14
Copy link
Contributor

test this please

}
}
scheduler.initialize(backend)
scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're initializing the backend in YarnClusterScheduler instead. Why are we changing this? It might make sense to fix this in a separate PR if this is a bug / some kind of improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what happened is that when I wrote this code there was no such thing as YarnClusterScheduler. So I created it, and moved its initialization to the backend class. Later someone else added the same class and I just merged my code with theirs.

I think this is cleaner because it removes reflection code from SparkContext. Unless you think that someone will ever try to match YarnClusterSchedulerBackend with something other than YarnClusterScheduler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at this in detail yet, but it seems a bit odd to me to move this particular one down into the YarnClusterScheduler but the rest define backend and initialize them here. I would think it would be easier to read having it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but this is just removing a bunch of reflection code and replacing it with a single line later on (in YarnClusterScheduler):

initialize(new YarnClusterSchedulerBackend(this, sc))

This looks much, much easier to read and cleaner to me, but if you guys somehow feel so strongly about it, I can revert the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reflection code exists since the class can't be loaded in all env .
Tom, has this changed recently?
On 05-Aug-2014 10:27 pm, "Marcelo Vanzin" [email protected] wrote:

In core/src/main/scala/org/apache/spark/SparkContext.scala:

@@ -1531,18 +1532,6 @@ object SparkContext extends Logging {
throw new SparkException("YARN mode not available ?", e)
}
}

  •    val backend = try {
    
  •      val clazz =
    
  •        Class.forName("org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend")
    
  •      val cons = clazz.getConstructor(classOf[TaskSchedulerImpl], classOf[SparkContext])
    
  •      cons.newInstance(scheduler, sc).asInstanceOf[CoarseGrainedSchedulerBackend]
    
  •    } catch {
    
  •      case e: Exception => {
    
  •        throw new SparkException("YARN mode not available ?", e)
    
  •      }
    
  •    }
    
  •    scheduler.initialize(backend)
    
  •    scheduler
    

Maybe I'm missing something, but this is just removing a bunch of
reflection code and replacing it with a single line later on (in
YarnClusterScheduler):

initialize(new YarnClusterSchedulerBackend(this, sc))

This looks much, much easier to read and cleaner to me, but if you guys
somehow feel so strongly about it, I can revert the change.


Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/1218/files#r15825749.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mridulm yes I'm aware of that. But before, SparkContext was using reflection to instantiate two different classes in the yarn package, and then connect them manually. I removed one of those (see that there's still reflection code to load YarnClusterScheduler) because it seemed unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgravescs sorry, I see what you mean now. Still, I see this as an improvement; the scheduler variables in all the cases are not used anywhere - in fact, even the initialize calls are duplicated in all the different cases.

So unless there is a real desire to be able to match backends and schedulers at will, I think encapsulating the backend initialization like I did is a better pattern.

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1218. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17294/consoleFull

@@ -300,4 +303,7 @@ private[spark] class CoarseMesosSchedulerBackend(
logInfo("Executor lost: %s, marking slave %s as lost".format(e.getValue, s.getValue))
slaveLost(d, s)
}

override def applicationId(): Option[String] =
Some(frameworkId).map(id => Some(id.getValue())).getOrElse(null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: orNull, here and other places

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can't you just do Option(frameworkId).map(_.getValue) here? I believe the existing code is incorrect because Some(null) is not the same as None

@andrewor14
Copy link
Contributor

@vanzin Thanks for your PR, I left some comments inline. The main points are the following: I'm not sure what it means for an application to have an option of application ID. From the perspective of an application it should always have an ID. If this is an implementation detail (i.e. because certain cluster managers don't provide their own IDs), then it might make sense to use a default (e.g. mesos-<app-name>-<timestamp>).

From your description I take it that you haven't had the chance to test this on Mesos. Given that I think it makes sense to hold back on adding this behavior there for now and instead file a JIRA for it. Also, now that #1094 is merged, I assume part of your changes in the YARN code needs to be reverted.


replayBus.replay()

// Note that this does not have any effect due to SPARK-2169.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that #1252 is merged, is this still true?

@SparkQA
Copy link

SparkQA commented Sep 2, 2014

QA tests have finished for PR 1218 at commit 56fe42e.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,
    • abstract class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, actorSystem: ActorSystem)

registrationLock.wait()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this lead to a deadlock? While we wait on the registrationLock we're still inside the synchronized block, so the notify thread may not even get into the synchronized block in wakeUpContext. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait() releases the lock while the wait happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, it's synchronized on the same thing.

@andrewor14
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 2, 2014

QA tests have started for PR 1218 at commit 2d19f3c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 2, 2014

QA tests have finished for PR 1218 at commit 2d19f3c.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String], time: Long,

@andrewor14
Copy link
Contributor

Hey @vanzin is there a JIRA for this?

@vanzin
Copy link
Contributor Author

vanzin commented Sep 3, 2014

I did this work in the context of SPARK-2150 (cited in the commit message), but there's no jira specifically for using this solution.

@andrewor14
Copy link
Contributor

Since this is a non-trivial change and covers a somewhat wide scope, can you make a JIRA specifically for this PR? Then maybe we can link it to SPARK-2150. This will make it easier for us to keep track of what issues the changes correspond to.

@vanzin vanzin changed the title Expose aplication ID in ApplicationStart event, use it in history server. [SPARK-3388] Expose aplication ID in ApplicationStart event, use it in history server. Sep 3, 2014
@andrewor14
Copy link
Contributor

Thanks, I merged this (and resolved a minor conflict) into master.

@asfgit asfgit closed this in f2b5b61 Sep 3, 2014
@vanzin vanzin deleted the yarn-hs-link-2 branch September 3, 2014 23:01
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…n history server.

This change exposes the application ID generated by the Spark Master, Mesos or Yarn
via the SparkListenerApplicationStart event. It then uses that information to expose the
application via its ID in the history server, instead of using the internal directory name
generated by the event logger as an application id. This allows someone who knows
the application ID to easily figure out the URL for the application's entry in the HS, aside
from looking better.

In Yarn mode, this is used to generate a direct link from the RM application list to the
Spark history server entry (thus providing a fix for SPARK-2150).

Note this sort of assumes that the different managers will generate app ids that are
sufficiently different from each other that clashes will not occur.

Author: Marcelo Vanzin <[email protected]>

This patch had conflicts when merged, resolved by
Committer: Andrew Or <[email protected]>

Closes apache#1218 from vanzin/yarn-hs-link-2 and squashes the following commits:

2d19f3c [Marcelo Vanzin] Review feedback.
6706d3a [Marcelo Vanzin] Implement applicationId() in base classes.
56fe42e [Marcelo Vanzin] Fix cluster mode history address, plus a cleanup.
44112a8 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
8278316 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
a86bbcf [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
a0056e6 [Marcelo Vanzin] Unbreak test.
4b10cfd [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
cb0cab2 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
25f2826 [Marcelo Vanzin] Add MIMA excludes.
f0ba90f [Marcelo Vanzin] Use BufferedIterator.
c90a08d [Marcelo Vanzin] Remove unused code.
3f8ec66 [Marcelo Vanzin] Review feedback.
21aa71b [Marcelo Vanzin] Fix JSON test.
b022bae [Marcelo Vanzin] Undo SparkContext cleanup.
c6d7478 [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
4e3483f [Marcelo Vanzin] Fix test.
57517b8 [Marcelo Vanzin] Review feedback. Mostly, more consistent use of Scala's Option.
311e49d [Marcelo Vanzin] Merge branch 'master' into yarn-hs-link-2
d35d86f [Marcelo Vanzin] Fix yarn backend after rebase.
36dc362 [Marcelo Vanzin] Don't use Iterator::takeWhile().
0afd696 [Marcelo Vanzin] Wait until master responds before returning from start().
abc4697 [Marcelo Vanzin] Make FsHistoryProvider keep a map of applications by id.
26b266e [Marcelo Vanzin] Use Mesos framework ID as Spark application ID.
b3f3664 [Marcelo Vanzin] [yarn] Make the RM link point to the app direcly in the HS.
2fb7de4 [Marcelo Vanzin] Expose the application ID in the ApplicationStart event.
ed10348 [Marcelo Vanzin] Expose application id to spark context.
mapr-devops pushed a commit to mapr/spark that referenced this pull request May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants