[SPARK-7347] Dag visualization: add hover to RDDs on job page #5912

andrewor14 · 2015-05-05T17:08:14Z

Add tooltips to the dots on the job page so it's clearer what they represent.

By the way, most of the commits are from a left-over PR that's already merged. Really, only one of these commits actually belong to this PR, and this has caused me many unnecessary merge conflicts...

This commit provides a mechanism to set and unset the call scope around each RDD operation defined in RDD.scala. This is useful for tagging an RDD with the scope in which it is created. This will be extended to similar methods in SparkContext.scala and other relevant files in a future commit.

This includes the scope field that we added in previous commits, and the parent IDs for tracking the lineage through the listener API.

It turns out that the previous scope information is insufficient for producing a valid dot file. In particular, the scope hierarchy was missing, but crucial to differentiate between a parent RDD being in the same encompassing scope and it being in a completely distinct scope. Also, unique scope identifiers are needed to simplify the code significantly. This commit further adds the translation logic in a UI listener that converts RDDInfos to dot files.

The previous "working" implementation frequently ran into NotSerializableExceptions. Why? ClosureCleaner doesn't like closures being wrapped in other closures, and these closures are simply not cleaned (details are intentionally omitted here). This commit reimplements scoping through annotations. All methods that should be scoped are now annotated with @RDDScope. Then, on creation, each RDD derives its scope from the stack trace, similar to how it derives its call site. This is the cleanest approach that bypasses NotSerializableExceptions with least significant limitations.

Just a small code re-organization.

Before this commit, this patch relies on a JavaScript version of GraphViz that was compiled from C. Even the minified version of this resource was ~2.5M. The main motivation for switching away from this library, however, is that this is a complete black box of which we have absolutely no control. It is not at all extensible, and if something breaks we will have a hard time understanding why. The new library, dagre-d3, is not perfect either. It does not officially support clustering of nodes; for certain large graphs, the clusters will have a lot of unnecessary whitespace. A few in the dagre-d3 community are looking into a solution, but until then we will have to live with this (minor) inconvenience.

For instance, this adds ability to throw away old stage graphs.

The problem with annotations is that there is no way to associate an RDD's scope with another's. This is because the stack trace simply does not expose enough information for us to associate one instance of a method invocation with another. So, we're back to closures. Note that this still suffers from the same not serializable issue previously discussed, and this is being fixed in the ClosureCleaner separately.

The closure cleaner doesn't like these statements, for a good reason.

This includes a generalization of the visualization previously displayed on the stage page. More functionality is needed in JavaScript to prevent the job visualization from looking too cluttered. This is still WIP.

This requires us to track incoming and outgoing edges in each stage on the backend, and render the connecting edges manually ourselves in d3.

Previously we had a lot of overlapping boxes for say ALS. This is because we did not take into account of the widths of the previous boxes.

Conflicts: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala

This commit should not introduce any substantial functionality differences. It just cleans up the JavaScript side of this patch such that it is easier to follow.

Conflicts: core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js core/src/main/scala/org/apache/spark/rdd/RDD.scala core/src/main/scala/org/apache/spark/ui/SparkUI.scala core/src/main/scala/org/apache/spark/ui/UIUtils.scala core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala

AmplabJenkins · 2015-05-05T17:12:10Z

Merged build triggered.

AmplabJenkins · 2015-05-05T17:12:18Z

Merged build started.

SparkQA · 2015-05-05T17:14:04Z

Test build #31888 has started for PR 5912 at commit 516c930.

SparkQA · 2015-05-05T17:14:10Z

Test build #31888 has finished for PR 5912 at commit 516c930.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class EnumUtil

AmplabJenkins · 2015-05-05T17:14:11Z

Merged build finished. Test FAILed.

AmplabJenkins · 2015-05-05T17:14:11Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31888/
Test FAILed.

AmplabJenkins · 2015-05-05T18:27:10Z

Merged build triggered.

AmplabJenkins · 2015-05-05T18:27:19Z

Merged build started.

SparkQA · 2015-05-05T18:27:57Z

Test build #31895 has started for PR 5912 at commit 07f25c3.

SparkQA · 2015-05-05T20:12:11Z

Test build #31895 has finished for PR 5912 at commit 07f25c3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class EnumUtil

AmplabJenkins · 2015-05-05T20:12:15Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-05T20:12:16Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31895/
Test PASSed.

Conflicts: core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js

The new div is not actually in the SVG itself, so we need to use a more general selector.

AmplabJenkins · 2015-05-07T01:12:11Z

Merged build triggered.

AmplabJenkins · 2015-05-07T01:12:17Z

Merged build started.

SparkQA · 2015-05-07T01:12:59Z

Test build #32056 has started for PR 5912 at commit 4fb4545.

andrewor14 · 2015-05-07T01:36:12Z

Closing in favor of #5957

SparkQA · 2015-05-07T02:58:04Z

Test build #32056 has finished for PR 5912 at commit 4fb4545.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-05-07T02:58:09Z

Merged build finished. Test PASSed.

AmplabJenkins · 2015-05-07T02:58:09Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32056/
Test PASSed.

Andrew Or added 30 commits April 16, 2015 17:33

Add a few missing scopes to certain RDD methods

a9ed4f9

Expose the necessary information in RDDInfo

5143523

This includes the scope field that we added in previous commits, and the parent IDs for tracking the lineage through the listener API.

First working implementation of visualization with vis.js

f22f337

Revert a few unintended style changes

494d5c2

Move RDD scope util methods and logic to its own file

6a7cdca

Just a small code re-organization.

Merge branch 'master' of github.com:apache/spark into viz

5e22946

Merge branch 'master' of github.com:apache/spark into viz

fe7816f

Fill in documentation + miscellaneous minor changes

8dd5af2

For instance, this adds ability to throw away old stage graphs.

Embed the viz in the UI in a toggleable manner

71281fa

Add ID to node label (minor)

09d361e

Rat excludes

52187fc

Ensure that HadoopRDD is actually serializable

aa868a9

Merge branch 'master' of github.com:apache/spark into viz2

4310271

Fix scala style

7ef957c

Merge branch 'master' of github.com:apache/spark into viz2

d19c4da

Remove all return statements in withScope

6e2cfea

The closure cleaner doesn't like these statements, for a good reason.

Add parent IDs to StageInfo

43de96e

Fix line too long

5e388ea

Remove more return statements from scopes

5f07e9c

Introduce visualization to the Job Page

ab91416

This includes a generalization of the visualization previously displayed on the stage page. More functionality is needed in JavaScript to prevent the job visualization from looking too cluttered. This is still WIP.

Connect RDDs across stages + update style

5c7ce16

This requires us to track incoming and outgoing edges in each stage on the backend, and render the connecting edges manually ourselves in d3.

Translate stage boxes taking into account the width

deb48a0

Previously we had a lot of overlapping boxes for say ALS. This is because we did not take into account of the widths of the previous boxes.

Add link from jobs to stages

0706992

Merge branch 'master' of github.com:apache/spark into viz2

b80cc52

Conflicts: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala

Refactor + clean up + document JS visualization code

f9830a2

This commit should not introduce any substantial functionality differences. It just cleans up the JavaScript side of this patch such that it is easier to follow.

Andrew Or added 4 commits May 4, 2015 13:52

Change RDD cache color to red (minor)

01ba336

Round corners of RDD boxes on stage page (minor)

666c03b

Add tooltips to RDDs on job page

5fb429c

Fix RAT

07f25c3

andrewor14 changed the title ~~[SPARK-7347] Add hover to RDDs in DAG visualization~~ [SPARK-7347] Dag visualization: add hover to RDDs on job page May 6, 2015

Andrew Or added 3 commits May 6, 2015 17:58

Merge branch 'master' of github.com:apache/spark into viz-hover

48cb075

Conflicts: core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js

Merge branch 'master' of github.com:apache/spark into viz-hover

4a15f94

Fix CSS class selection issue

4fb4545

The new div is not actually in the SVG itself, so we need to use a more general selector.

andrewor14 closed this May 7, 2015

andrewor14 deleted the viz-hover branch May 7, 2015 01:36

[SPARK-7347] Dag visualization: add hover to RDDs on job page #5912

[SPARK-7347] Dag visualization: add hover to RDDs on job page #5912

Uh oh!

Conversation

andrewor14 commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

AmplabJenkins commented May 5, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

SparkQA commented May 7, 2015

Uh oh!

andrewor14 commented May 7, 2015

Uh oh!

SparkQA commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

AmplabJenkins commented May 7, 2015

Uh oh!

Uh oh!