You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is an advanced experimental API. We encourage you to experiment with it and let us know.
249
+
See the `design document <https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md>`_
250
+
for more background.
251
+
252
+
The first step in executing a GroupBy analysis is to *identify* the groups and create an intermediate array where each group member is identified
253
+
by a unique integer code. Commonly this step is executed using :py:func:`pandas.factorize` for grouping by a categorical variable (e.g. ``['a', 'b', 'a', 'b']``)
254
+
and :py:func:`pandas.cut` or :py:func:`numpy.digitize` or :py:func:`numpy.searchsorted` for binning a numeric variable.
255
+
256
+
Much of the complexity in more complex GroupBy problems can be abstracted to a specialized "factorize" operation identifying the necessary groups.
257
+
:py:class:`groupers.Grouper` and :py:class:`groupers.Resampler` objects provide an extension point allowing Xarray's GroupBy machinery
258
+
to use specialized "factorization" operations.
259
+
Eventually, they will also provide a natural way to extend GroupBy to grouping by multiple variables: ``ds.groupby(x=BinGrouper(...), t=Resampler(freq="M", ...)).mean()``.
260
+
261
+
Xarray provides three Grouper objects today
262
+
263
+
1. :py:class:`UniqueGrouper` for categorical grouping
264
+
2. :py:class:`BinGrouper` for binned grouping
265
+
3. :py:class:`TimeResampler` for resampling along a datetime coordinate
266
+
267
+
These objects mean that
268
+
269
+
- ``ds.groupby("categories")`` is identical to ``ds.groupby(categories=UniqueGrouper())``
270
+
- ``ds.groupby_bins("values", bins=5)`` is identical to ``ds.groupby(value=BinGrouper(bins=7))``.
271
+
- ``ds.resample(time="H")`` is identical to ``ds.groupby(time=TimeResampler(freq="H"))``.
272
+
273
+
For example consider a seasonal grouping ``ds.groupby("time.season")``. This approach treats ``ds.time.dt.season`` as a categorical variable to group by and is naive
274
+
to the many complexities of time grouping. A specialized ``SeasonGrouper`` and ``SeasonResampler`` object would allow
275
+
276
+
- Supporting seasons that span a year-end.
277
+
- Only including seasons with complete data coverage.
278
+
- Grouping over seasons of unequal length
279
+
- Returning results with seasons in the appropriate chronological order
280
+
281
+
To define a custom grouper simply subclass either the :py:class:`Grouper` or :py:class:`Resampler` abstract base class
282
+
and provide a customized ``factorize`` method. This method must accept a :py:class:`DataArray` to group by and return
283
+
an instance of :py:class:`EncodedGroups`.
284
+
285
+
.. ipython:: python
286
+
287
+
from xarray import Variable
288
+
289
+
290
+
classYearGrouper(xr.groupers.Grouper):
291
+
"""
292
+
An example re-implementation of ``.groupby("time.year")``.
0 commit comments