Skip to content

Conversation

mzeitlin11
Copy link
Member

Built on #40546, is a relatively minor change after that.

@mzeitlin11 mzeitlin11 added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Groupby labels Mar 23, 2021
@@ -983,6 +989,9 @@ def rank_1d(
else:
mask = np.zeros(shape=len(masked_vals), dtype=np.uint8)

# If ascending and na_option == 'bottom' or descending and
# na_option == 'top' -> we want to rank NaN as the highest
# so fill with the maximum value for the type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to highlight #40016 in case relevant here for consistency

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for linking that, I was confused by the documentation as well haha. Will adjust this comment in #40546

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good - one comment on type selection

@@ -1001,6 +1001,7 @@ def rank_1d(
ndarray[uint8_t, ndim=1] mask
bint keep_na, at_end, next_val_diff, check_labels, group_changed
rank_t nan_fill_val
float64_t grp_size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be an integer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this because grp_sizes was defined as an array of float64_t (I'd guess to avoid worrying about casting to float when dividing by grp_sizes at the end). Can change both to int if you think that would make more sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense. That feels like a mistake so yea if you can change the array and nothing breaks lets do that. If it causes an issue can do in a follow up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed in latest commit

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@WillAyd WillAyd added this to the 1.3 milestone Mar 27, 2021
@@ -998,12 +998,13 @@ def rank_1d(
TiebreakEnumType tiebreak
Py_ssize_t i, j, N, grp_start=0, dups=0, sum_ranks=0
Py_ssize_t grp_vals_seen=1, grp_na_count=0
ndarray[int64_t, ndim=1] lexsort_indexer
ndarray[float64_t, ndim=1] grp_sizes, out
ndarray[int64_t, ndim=1] lexsort_indexer, grp_sizes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lexsort_indexer[i] is being passed to ndarray.__getitem__ -> it shouldbe intp_t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch thanks

@jreback
Copy link
Contributor

jreback commented Mar 29, 2021

lgtm. can you merge master and ping on greenish

@mzeitlin11
Copy link
Member Author

lgtm. can you merge master and ping on greenish

@jreback greenish

@jreback jreback merged commit 67bc077 into pandas-dev:master Apr 1, 2021
@jreback
Copy link
Contributor

jreback commented Apr 1, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the bug/rank_pct_nans_groupby branch April 1, 2021 16:16
vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Unexpected behaviour of groupby + rank(pct=True, method="dense")
5 participants