-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000', '3/13/2000', '3/13/2000'])
my_df = pd.DataFrame({
'date': s,
'values': [10, 3, 11, 12, 3]
}, index=list(range(len(s))))
my_df['date'] = pd.to_datetime(my_df['date'])
def my_custom_function(ser: pd.Series):
print(ser.index)
return ser.index[0]
results = my_df.rolling('2d', on='date').apply(my_custom_function)
print(results)
Issue Description
Starting in 1.4.1 and after, the code throws a No Numeric Types to Aggregate
error because an undocumented change was made to rolling in which using on=col
parameter in rolling()
causes a later apply()
to re-index to the col
column, making the index of the series sent to the function the wrong index.
The specific code doing this re-indexing is in RollingExpandingMixin
:
def apply_func(values, begin, end, min_periods, raw=raw):
if not raw:
# GH 45912
values = Series(values, index=self._on)
return window_func(values, begin, end, min_periods)
return apply_func
This loses important functionality because currently rolling().apply()
can only process one column at a time, so if you want to be able to recover what the windows were in able to do some operation spanning all columns, you need to notate the values of the indexes for the windows passed in [using a closure or global variable] and use those indexes to reconstruct later what the windows were. With the possibility of repeated values, that may mean creating an integer "primary key" index to use as the index and using on=my_date_column
to specify the windowing.
The current code assumes that the user wants to use the on
column for indexing, but if that were the case, the user could have simply re-indexed in the calling code.
Expected Behavior
In 1.3.1 the code above produces expected behavior: a 2-column table with numeric index and the the final column indicating the first value of that index that starts each of the 5 windows [0, 0, 1, 1, 1]
date values
0 2000-03-11 0.0
1 2000-03-12 0.0
2 2000-03-13 1.0
3 2000-03-13 1.0
4 2000-03-13 1.0
Installed Versions
1.4.3