Skip to content

[SPARK-7347] Dag visualization: add hover to RDDs on job page #5912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 53 commits into from

Conversation

andrewor14
Copy link
Contributor

Add tooltips to the dots on the job page so it's clearer what they represent.

By the way, most of the commits are from a left-over PR that's already merged. Really, only one of these commits actually belong to this PR, and this has caused me many unnecessary merge conflicts...

Andrew Or added 30 commits April 16, 2015 17:33
This commit provides a mechanism to set and unset the call scope
around each RDD operation defined in RDD.scala. This is useful
for tagging an RDD with the scope in which it is created. This
will be extended to similar methods in SparkContext.scala and
other relevant files in a future commit.
This includes the scope field that we added in previous commits,
and the parent IDs for tracking the lineage through the listener
API.
It turns out that the previous scope information is insufficient
for producing a valid dot file. In particular, the scope hierarchy
was missing, but crucial to differentiate between a parent RDD
being in the same encompassing scope and it being in a completely
distinct scope. Also, unique scope identifiers are needed to
simplify the code significantly.

This commit further adds the translation logic in a UI listener
that converts RDDInfos to dot files.
The previous "working" implementation frequently ran into
NotSerializableExceptions. Why? ClosureCleaner doesn't like
closures being wrapped in other closures, and these closures
are simply not cleaned (details are intentionally omitted here).

This commit reimplements scoping through annotations. All methods
that should be scoped are now annotated with @RDDScope. Then, on
creation, each RDD derives its scope from the stack trace, similar
to how it derives its call site. This is the cleanest approach
that bypasses NotSerializableExceptions with least significant
limitations.
Just a small code re-organization.
Before this commit, this patch relies on a JavaScript version of
GraphViz that was compiled from C. Even the minified version of
this resource was ~2.5M. The main motivation for switching away
from this library, however, is that this is a complete black box
of which we have absolutely no control. It is not at all extensible,
and if something breaks we will have a hard time understanding
why.

The new library, dagre-d3, is not perfect either. It does not
officially support clustering of nodes; for certain large graphs,
the clusters will have a lot of unnecessary whitespace. A few in
the dagre-d3 community are looking into a solution, but until then
we will have to live with this (minor) inconvenience.
For instance, this adds ability to throw away old stage graphs.
The problem with annotations is that there is no way to associate
an RDD's scope with another's. This is because the stack trace
simply does not expose enough information for us to associate one
instance of a method invocation with another.

So, we're back to closures. Note that this still suffers from the
same not serializable issue previously discussed, and this is being
fixed in the ClosureCleaner separately.
The closure cleaner doesn't like these statements, for a good
reason.
This includes a generalization of the visualization previously
displayed on the stage page. More functionality is needed in
JavaScript to prevent the job visualization from looking too
cluttered. This is still WIP.
This requires us to track incoming and outgoing edges in each
stage on the backend, and render the connecting edges manually
ourselves in d3.
Previously we had a lot of overlapping boxes for say ALS. This is
because we did not take into account of the widths of the previous
boxes.
Conflicts:
	core/src/main/scala/org/apache/spark/storage/RDDInfo.scala
	core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala
	core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala
This commit should not introduce any substantial functionality
differences. It just cleans up the JavaScript side of this patch
such that it is easier to follow.
Andrew Or added 4 commits May 4, 2015 13:52
Conflicts:
	core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js
	core/src/main/scala/org/apache/spark/rdd/RDD.scala
	core/src/main/scala/org/apache/spark/ui/SparkUI.scala
	core/src/main/scala/org/apache/spark/ui/UIUtils.scala
	core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala
	core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
	core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31888 has started for PR 5912 at commit 516c930.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31888 has finished for PR 5912 at commit 516c930.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class EnumUtil

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31888/
Test FAILed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31895 has started for PR 5912 at commit 07f25c3.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31895 has finished for PR 5912 at commit 07f25c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class EnumUtil

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31895/
Test PASSed.

@andrewor14 andrewor14 changed the title [SPARK-7347] Add hover to RDDs in DAG visualization [SPARK-7347] Dag visualization: add hover to RDDs on job page May 6, 2015
Andrew Or added 3 commits May 6, 2015 17:58
Conflicts:
	core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js
The new div is not actually in the SVG itself, so we need to use
a more general selector.
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32056 has started for PR 5912 at commit 4fb4545.

@andrewor14
Copy link
Contributor Author

Closing in favor of #5957

@andrewor14 andrewor14 closed this May 7, 2015
@andrewor14 andrewor14 deleted the viz-hover branch May 7, 2015 01:36
@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32056 has finished for PR 5912 at commit 4fb4545.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32056/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants