Skip to content

Commit b943f5d

Browse files
Brennon Yorksrowen
authored andcommitted
[SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work
Turns out, per the [convo on the JIRA](https://issues.apache.org/jira/browse/SPARK-4600), `diff` is acting exactly as should. It became a large misconception as I thought it meant set difference, when in fact it does not. To that extent I merely updated the `diff` documentation to, hopefully, better reflect its true intentions moving forward. Author: Brennon York <[email protected]> Closes #5015 from brennonyork/SPARK-4600 and squashes the following commits: 1e1d1e5 [Brennon York] reverted internal diff docs 92288f7 [Brennon York] reverted both the test suite and the diff function back to its origin functionality f428623 [Brennon York] updated diff documentation to better represent its function cc16d65 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600 66818b9 [Brennon York] added small secondary diff test 99ad412 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600 74b8c95 [Brennon York] corrected method by leveraging bitmask operations to correctly return only the portions of that are different from the calling VertexRDD 9717120 [Brennon York] updated diff impl to cause fewer objects to be created 710a21c [Brennon York] working diff given test case aa57f83 [Brennon York] updated to set ShortestPaths to run 'forward' rather than 'backward'
1 parent 7f13434 commit b943f5d

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,8 +122,11 @@ abstract class VertexRDD[VD](
122122
def mapValues[VD2: ClassTag](f: (VertexId, VD) => VD2): VertexRDD[VD2]
123123

124124
/**
125-
* Hides vertices that are the same between `this` and `other`; for vertices that are different,
126-
* keeps the values from `other`.
125+
* For each vertex present in both `this` and `other`, `diff` returns only those vertices with
126+
* differing values; for values that are different, keeps the values from `other`. This is
127+
* only guaranteed to work if the VertexRDDs share a common ancestor.
128+
*
129+
* @param other the other VertexRDD with which to diff against.
127130
*/
128131
def diff(other: VertexRDD[VD]): VertexRDD[VD]
129132

0 commit comments

Comments
 (0)