Skip to content

Commit f00d9d2

Browse files
committed
[wip] Add docs
1 parent 7e74ec6 commit f00d9d2

File tree

6 files changed

+194
-46
lines changed

6 files changed

+194
-46
lines changed

doc/api-hidden.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -693,3 +693,7 @@
693693

694694
coding.times.CFTimedeltaCoder
695695
coding.times.CFDatetimeCoder
696+
697+
core.groupers.Grouper
698+
core.groupers.Resampler
699+
core.groupers.EncodedGroups

doc/api.rst

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -801,6 +801,18 @@ DataArray
801801
DataArrayGroupBy.dims
802802
DataArrayGroupBy.groups
803803

804+
Grouper Objects
805+
---------------
806+
807+
.. currentmodule:: xarray.core
808+
809+
.. autosummary::
810+
:toctree: generated/
811+
812+
groupers.BinGrouper
813+
groupers.UniqueGrouper
814+
groupers.TimeResampler
815+
804816

805817
Rolling objects
806818
===============
@@ -1026,17 +1038,20 @@ DataArray
10261038
Accessors
10271039
=========
10281040

1029-
.. currentmodule:: xarray
1041+
.. currentmodule:: xarray.core
10301042

10311043
.. autosummary::
10321044
:toctree: generated/
10331045

1034-
core.accessor_dt.DatetimeAccessor
1035-
core.accessor_dt.TimedeltaAccessor
1036-
core.accessor_str.StringAccessor
1046+
accessor_dt.DatetimeAccessor
1047+
accessor_dt.TimedeltaAccessor
1048+
accessor_str.StringAccessor
1049+
10371050

10381051
Custom Indexes
10391052
==============
1053+
.. currentmodule:: xarray
1054+
10401055
.. autosummary::
10411056
:toctree: generated/
10421057

doc/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,7 @@
166166
"CategoricalIndex": "~pandas.CategoricalIndex",
167167
"TimedeltaIndex": "~pandas.TimedeltaIndex",
168168
"DatetimeIndex": "~pandas.DatetimeIndex",
169+
"IntervalIndex": "~pandas.IntervalIndex",
169170
"Series": "~pandas.Series",
170171
"DataFrame": "~pandas.DataFrame",
171172
"Categorical": "~pandas.Categorical",

doc/user-guide/groupby.rst

Lines changed: 77 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. currentmodule:: xarray
2+
13
.. _groupby:
24

35
GroupBy: Group and Bin Data
@@ -15,19 +17,20 @@ __ https://www.jstatsoft.org/v40/i01/paper
1517
- Apply some function to each group.
1618
- Combine your groups back into a single data object.
1719

18-
Group by operations work on both :py:class:`~xarray.Dataset` and
19-
:py:class:`~xarray.DataArray` objects. Most of the examples focus on grouping by
20+
Group by operations work on both :py:class:`Dataset` and
21+
:py:class:`DataArray` objects. Most of the examples focus on grouping by
2022
a single one-dimensional variable, although support for grouping
2123
over a multi-dimensional variable has recently been implemented. Note that for
2224
one-dimensional data, it is usually faster to rely on pandas' implementation of
2325
the same pipeline.
2426

2527
.. tip::
2628

27-
To substantially improve the performance of GroupBy operations, particularly
28-
with dask `install the flox package <https://flox.readthedocs.io>`_. flox
29+
`Install the flox package <https://flox.readthedocs.io>`_ to substantially improve the performance
30+
of GroupBy operations, particularly with dask. flox
2931
`extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html>`_
30-
by allowing grouping by multiple variables, and lazy grouping by dask arrays. If installed, Xarray will automatically use flox by default.
32+
by allowing grouping by multiple variables, and lazy grouping by dask arrays.
33+
If installed, Xarray will automatically use flox by default.
3134

3235
Split
3336
~~~~~
@@ -87,7 +90,7 @@ Binning
8790
Sometimes you don't want to use all the unique values to determine the groups
8891
but instead want to "bin" the data into coarser groups. You could always create
8992
a customized coordinate, but xarray facilitates this via the
90-
:py:meth:`~xarray.Dataset.groupby_bins` method.
93+
:py:meth:`Dataset.groupby_bins` method.
9194

9295
.. ipython:: python
9396
@@ -110,7 +113,7 @@ Apply
110113
~~~~~
111114

112115
To apply a function to each group, you can use the flexible
113-
:py:meth:`~xarray.core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
116+
:py:meth:`core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
114117
concatenated back together along the group axis:
115118

116119
.. ipython:: python
@@ -121,8 +124,8 @@ concatenated back together along the group axis:
121124
122125
arr.groupby("letters").map(standardize)
123126
124-
GroupBy objects also have a :py:meth:`~xarray.core.groupby.DatasetGroupBy.reduce` method and
125-
methods like :py:meth:`~xarray.core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
127+
GroupBy objects also have a :py:meth:`core.groupby.DatasetGroupBy.reduce` method and
128+
methods like :py:meth:`core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
126129
aggregation function:
127130

128131
.. ipython:: python
@@ -183,7 +186,7 @@ Iterating and Squeezing
183186
Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
184187
a GroupBy object. This behaviour is being removed.
185188
You can always squeeze explicitly later with the Dataset or DataArray
186-
:py:meth:`~xarray.DataArray.squeeze` methods.
189+
:py:meth:`DataArray.squeeze` methods.
187190

188191
.. ipython:: python
189192
@@ -217,7 +220,7 @@ __ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dime
217220
da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)
218221
219222
Because multidimensional groups have the ability to generate a very large
220-
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
223+
number of bins, coarse-binning via :py:meth:`Dataset.groupby_bins`
221224
may be desirable:
222225

223226
.. ipython:: python
@@ -232,3 +235,66 @@ applying your function, and then unstacking the result:
232235
233236
stacked = da.stack(gridcell=["ny", "nx"])
234237
stacked.groupby("gridcell").sum(...).unstack("gridcell")
238+
239+
.. _groupby.groupers:
240+
241+
Extending GroupBy: Grouper Objects
242+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
243+
244+
.. currentmodule:: xarray.core.groupers
245+
246+
.. warning::
247+
248+
This is an advanced experimental API. We encourage you to experiment with it and let us know.
249+
See the `design document <https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md>`_
250+
for more background.
251+
252+
The first step in executing a GroupBy analysis is to *identify* the groups and create an intermediate array where each group member is identified
253+
by a unique integer code. Commonly this step is executed using :py:func:`pandas.factorize` for grouping by a categorical variable (e.g. ``['a', 'b', 'a', 'b']``)
254+
and :py:func:`pandas.cut` or :py:func:`numpy.digitize` or :py:func:`numpy.searchsorted` for binning a numeric variable.
255+
256+
Much of the complexity in more complex GroupBy problems can be abstracted to a specialized "factorize" operation identifying the necessary groups.
257+
:py:class:`groupers.Grouper` and :py:class:`groupers.Resampler` objects provide an extension point allowing Xarray's GroupBy machinery
258+
to use specialized "factorization" operations.
259+
Eventually, they will also provide a natural way to extend GroupBy to grouping by multiple variables: ``ds.groupby(x=BinGrouper(...), t=Resampler(freq="M", ...)).mean()``.
260+
261+
Xarray provides three Grouper objects today
262+
263+
1. :py:class:`UniqueGrouper` for categorical grouping
264+
2. :py:class:`BinGrouper` for binned grouping
265+
3. :py:class:`TimeResampler` for resampling along a datetime coordinate
266+
267+
These objects mean that
268+
269+
- ``ds.groupby("categories")`` is identical to ``ds.groupby(categories=UniqueGrouper())``
270+
- ``ds.groupby_bins("values", bins=5)`` is identical to ``ds.groupby(value=BinGrouper(bins=7))``.
271+
- ``ds.resample(time="H")`` is identical to ``ds.groupby(time=TimeResampler(freq="H"))``.
272+
273+
For example consider a seasonal grouping ``ds.groupby("time.season")``. This approach treats ``ds.time.dt.season`` as a categorical variable to group by and is naive
274+
to the many complexities of time grouping. A specialized ``SeasonGrouper`` and ``SeasonResampler`` object would allow
275+
276+
- Supporting seasons that span a year-end.
277+
- Only including seasons with complete data coverage.
278+
- Grouping over seasons of unequal length
279+
- Returning results with seasons in the appropriate chronological order
280+
281+
To define a custom grouper simply subclass either the :py:class:`Grouper` or :py:class:`Resampler` abstract base class
282+
and provide a customized ``factorize`` method. This method must accept a :py:class:`DataArray` to group by and return
283+
an instance of :py:class:`EncodedGroups`.
284+
285+
.. ipython:: python
286+
287+
from xarray import Variable
288+
289+
290+
class YearGrouper(xr.groupers.Grouper):
291+
"""
292+
An example re-implementation of ``.groupby("time.year")``.
293+
"""
294+
295+
def factorize(self, group) -> xr.groupers.EncodedGroups:
296+
assert np.issubdtype(group.dtype, np.datetime64)
297+
year = group.dt.year
298+
codes, uniques = pd.factorize(year)
299+
unique_coord = Variable(dims="year", data=uniques)
300+
return EncodedGroups(codes=codes, unique_coord=unique_coord)

xarray/__init__.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@
5656
# `mypy --strict` running in projects that import xarray.
5757
__all__ = (
5858
# Sub-packages
59+
"groupers",
5960
"testing",
6061
"tutorial",
6162
# Top-level functions
@@ -95,8 +96,6 @@
9596
"unify_chunks",
9697
"where",
9798
"zeros_like",
98-
# Submodules
99-
"groupers",
10099
# Classes
101100
"CFTimeIndex",
102101
"Context",

0 commit comments

Comments
 (0)