[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey. #4761

foxik · 2015-02-25T07:08:43Z

The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index.

The current implementation also work, but always uses only two partitions -- the first one and the last one (because the bisect_left return returns either "beginning" or "end" for a descending sequence).

The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index.

AmplabJenkins · 2015-02-25T07:12:09Z

Can one of the admins verify this patch?

JoshRosen · 2015-03-03T01:36:07Z

Could you add a regression test for this issue? It looks like you have one in the JIRA ticket, so adding one hopefully should not be much work. Take a look at python/pyspark/tests.py for examples of this. Make sure to add a comment referencing the JIRA number.

foxik · 2015-03-05T06:45:20Z

I added the regression test. It also tests that sortByKey returns sorted sequence and tests also ascending sequence, which are not strictly necessary for SPARK-5969, but I added them anyway.

JoshRosen · 2015-03-05T21:39:53Z

Jenkins, this is ok to test.

SparkQA · 2015-03-05T21:42:45Z

Test build #28315 has started for PR 4761 at commit bc2647f.

This patch merges cleanly.

SparkQA · 2015-03-05T21:43:44Z

Test build #28315 has finished for PR 4761 at commit bc2647f.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-05T21:43:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28315/
Test FAILed.

foxik · 2015-03-06T08:38:35Z

I have amended the regression test commit to pass lint-python.

SparkQA · 2015-03-06T08:42:36Z

Test build #28334 has started for PR 4761 at commit 95896b5.

This patch merges cleanly.

SparkQA · 2015-03-06T10:02:03Z

Test build #28334 has finished for PR 4761 at commit 95896b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-06T10:02:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28334/
Test PASSed.

JoshRosen · 2015-04-08T22:30:27Z

@davies, does this look good to you? Sorry for letting this patch fall off my radar (slowly getting caught up on a backlog of reviews). If things look good, I can fix the merge conflict (which is probably just a conflict in tests) and get this committed.

davies · 2015-04-08T22:41:01Z

LGTM

JoshRosen · 2015-04-10T20:49:08Z

Alright, merging this into master (1.4.0) now. Thanks!

JoshRosen · 2015-04-10T20:54:07Z

Should this be backported anywhere?

davies · 2015-04-10T21:25:35Z

This is a bug since the beginning (0.8), could we back port it for all 1.0+ branches?

The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index. The current implementation also work, but always uses only two partitions -- the first one and the last one (because the bisect_left return returns either "beginning" or "end" for a descending sequence). Author: Milan Straka <[email protected]> This patch had conflicts when merged, resolved by Committer: Josh Rosen <[email protected]> Closes #4761 from foxik/fix-descending-sort and squashes the following commits: 95896b5 [Milan Straka] Add regression test for SPARK-5969. 5757490 [Milan Straka] Fix descending pyspark.rdd.sortByKey.

JoshRosen · 2015-04-10T22:24:29Z

I've cherry-picked the fix into branch-1.2 (1.2.3) and branch-1.3 (1.3.2). I'm going to omit the the pre-1.2 backports for now because I hit a test merge conflict and also because I think it's unlikely that we're going to release another 1.1.x version.

davies · 2015-04-10T22:28:23Z

Make sense, thank you!

The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index. The current implementation also work, but always uses only two partitions -- the first one and the last one (because the bisect_left return returns either "beginning" or "end" for a descending sequence). Author: Milan Straka <[email protected]> This patch had conflicts when merged, resolved by Committer: Josh Rosen <[email protected]> Closes apache#4761 from foxik/fix-descending-sort and squashes the following commits: 95896b5 [Milan Straka] Add regression test for SPARK-5969. 5757490 [Milan Straka] Fix descending pyspark.rdd.sortByKey.

Fix descending pyspark.rdd.sortByKey.

5757490

The samples should always be sorted in ascending order, because bisect.bisect_left is used on it. The reverse order of the result is already achieved in rangePartitioner by reversing the found index.

Add regression test for SPARK-5969.

95896b5

foxik force-pushed the fix-descending-sort branch from bc2647f to 95896b5 Compare March 6, 2015 08:37

asfgit closed this in 0375134 Apr 10, 2015

[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey. #4761

[SPARK-5969][PySpark] Fix descending pyspark.rdd.sortByKey. #4761

Uh oh!

Conversation

foxik commented Feb 25, 2015

Uh oh!

AmplabJenkins commented Feb 25, 2015

Uh oh!

JoshRosen commented Mar 3, 2015

Uh oh!

foxik commented Mar 5, 2015

Uh oh!

JoshRosen commented Mar 5, 2015

Uh oh!

SparkQA commented Mar 5, 2015

Uh oh!

SparkQA commented Mar 5, 2015

Uh oh!

AmplabJenkins commented Mar 5, 2015

Uh oh!

foxik commented Mar 6, 2015

Uh oh!

SparkQA commented Mar 6, 2015

Uh oh!

SparkQA commented Mar 6, 2015

Uh oh!

AmplabJenkins commented Mar 6, 2015

Uh oh!

JoshRosen commented Apr 8, 2015

Uh oh!

davies commented Apr 8, 2015

Uh oh!

JoshRosen commented Apr 10, 2015

Uh oh!

JoshRosen commented Apr 10, 2015

Uh oh!

davies commented Apr 10, 2015

Uh oh!

JoshRosen commented Apr 10, 2015

Uh oh!

davies commented Apr 10, 2015

Uh oh!

Uh oh!