Skip to content

Commit aca7991

Browse files
Davies Liurxin
authored andcommitted
[SPARK-5878] fix DataFrame.repartition() in Python
Also add tests for distinct() Author: Davies Liu <[email protected]> Closes #4667 from davies/repartition and squashes the following commits: 79059fd [Davies Liu] add test cb4915e [Davies Liu] fix repartition (cherry picked from commit c1b6fa9) Signed-off-by: Reynold Xin <[email protected]>
1 parent 9a565b8 commit aca7991

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

python/pyspark/sql/dataframe.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -434,12 +434,18 @@ def unpersist(self, blocking=True):
434434
def repartition(self, numPartitions):
435435
""" Return a new :class:`DataFrame` that has exactly `numPartitions`
436436
partitions.
437+
438+
>>> df.repartition(10).rdd.getNumPartitions()
439+
10
437440
"""
438-
return DataFrame(self._jdf.repartition(numPartitions, None), self.sql_ctx)
441+
return DataFrame(self._jdf.repartition(numPartitions), self.sql_ctx)
439442

440443
def distinct(self):
441444
"""
442445
Return a new :class:`DataFrame` containing the distinct rows in this DataFrame.
446+
447+
>>> df.distinct().count()
448+
2L
443449
"""
444450
return DataFrame(self._jdf.distinct(), self.sql_ctx)
445451

0 commit comments

Comments
 (0)