Skip to content

Commit 1de38bc

Browse files
AndrewILWilliamskmuehlbauerdcheriankeewisclausmichele
authored
Auto chunk (#4064)
* Added chunks='auto' option in dataset.py * FIX: correct dask array handling in _calc_idxminmax (#3922) * FIX: correct dask array handling in _calc_idxminmax * FIX: remove unneeded import, reformat via black * fix idxmax, idxmin with dask arrays * FIX: use array[dim].data in `_calc_idxminmax` as per @keewis suggestion, attach dim name to result * ADD: add dask tests to `idxmin`/`idxmax` dataarray tests * FIX: add back fixture line removed by accident * ADD: complete dask handling in `idxmin`/`idxmax` tests in test_dataarray, xfail dask tests for dtype dateime64 (M) * ADD: add "support dask handling for idxmin/idxmax" in whats-new.rst * MIN: reintroduce changes added by #3953 * MIN: change if-clause to use `and` instead of `&` as per review-comment * MIN: change if-clause to use `and` instead of `&` as per review-comment * WIP: remove dask handling entirely for debugging purposes * Test for dask computes * WIP: re-add dask handling (map_blocks-approach), add `with raise_if_dask_computes()` context to idxmin-tests * Use dask indexing instead of map_blocks. * Better chunk choice. * Return -1 for _nan_argminmax_object if all NaNs along dim * Revert "Return -1 for _nan_argminmax_object if all NaNs along dim" This reverts commit 58901b9. * Raise error for object arrays * No error for object arrays. Instead expect 1 compute in tests. Co-authored-by: dcherian <[email protected]> * fix the failing flake8 CI (#4057) * rename d and l to dim and length * Fixed typo in rasterio docs (#4063) * Added chunks='auto' option in dataset.py Added changes to whats-new.rst * Added chunks='auto' option in dataset.py Added changes to whats-new.rst * Error fix, catch chunks=None * Minor reformatting + flake8 changes * Added isinstance(chunks, (Number, str)) in dataset.py, passing * format changes * added auto-chunk test for dataarrays * Assert chunk sizes equal in auto-chunk test Co-authored-by: Kai Mühlbauer <[email protected]> Co-authored-by: dcherian <[email protected]> Co-authored-by: keewis <[email protected]> Co-authored-by: clausmichele <[email protected]> Co-authored-by: Keewis <[email protected]>
1 parent 3194b3e commit 1de38bc

File tree

3 files changed

+18
-3
lines changed

3 files changed

+18
-3
lines changed

doc/whats-new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,10 @@ Breaking changes
3636

3737
New Features
3838
~~~~~~~~~~~~
39+
40+
- ``chunks='auto'`` is now supported in the ``chunks`` argument of
41+
:py:meth:`Dataset.chunk`. (:issue:`4055`)
42+
By `Andrew Williams <https://github.com/AndrewWilliams3142>`_
3943
- Added :py:func:`xarray.cov` and :py:func:`xarray.corr` (:issue:`3784`, :pull:`3550`, :pull:`4089`).
4044
By `Andrew Williams <https://github.com/AndrewWilliams3142>`_ and `Robin Beer <https://github.com/r-beer>`_.
4145
- Added :py:meth:`DataArray.polyfit` and :py:func:`xarray.polyval` for fitting polynomials. (:issue:`3349`)

xarray/core/dataset.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1707,7 +1707,10 @@ def chunks(self) -> Mapping[Hashable, Tuple[int, ...]]:
17071707
def chunk(
17081708
self,
17091709
chunks: Union[
1710-
None, Number, Mapping[Hashable, Union[None, Number, Tuple[Number, ...]]]
1710+
None,
1711+
Number,
1712+
str,
1713+
Mapping[Hashable, Union[None, Number, str, Tuple[Number, ...]]],
17111714
] = None,
17121715
name_prefix: str = "xarray-",
17131716
token: str = None,
@@ -1725,7 +1728,7 @@ def chunk(
17251728
17261729
Parameters
17271730
----------
1728-
chunks : int or mapping, optional
1731+
chunks : int, 'auto' or mapping, optional
17291732
Chunk sizes along each dimension, e.g., ``5`` or
17301733
``{'x': 5, 'y': 5}``.
17311734
name_prefix : str, optional
@@ -1742,7 +1745,7 @@ def chunk(
17421745
"""
17431746
from dask.base import tokenize
17441747

1745-
if isinstance(chunks, Number):
1748+
if isinstance(chunks, (Number, str)):
17461749
chunks = dict.fromkeys(self.dims, chunks)
17471750

17481751
if chunks is not None:

xarray/tests/test_dask.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1035,6 +1035,14 @@ def test_unify_chunks_shallow_copy(obj, transform):
10351035
assert_identical(obj, unified) and obj is not obj.unify_chunks()
10361036

10371037

1038+
@pytest.mark.parametrize("obj", [make_da()])
1039+
def test_auto_chunk_da(obj):
1040+
actual = obj.chunk("auto").data
1041+
expected = obj.data.rechunk("auto")
1042+
np.testing.assert_array_equal(actual, expected)
1043+
assert actual.chunks == expected.chunks
1044+
1045+
10381046
def test_map_blocks_error(map_da, map_ds):
10391047
def bad_func(darray):
10401048
return (darray * darray.x + 5 * darray.y)[:1, :1]

0 commit comments

Comments
 (0)