Skip to content

combine_by_coordinates to handle unnamed data arrays. #4696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Jul 2, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
b35de8e
Added test for combine_by_coords changes.
aijams Dec 10, 2020
f966e76
Modified test case to expect a dataset instead of a DataArray. Added …
aijams Dec 11, 2020
68b7b49
Added tests to check combine_by_coords for exception with mixed DataA…
aijams Dec 12, 2020
540961f
Formatting changes after running black
aijams Dec 15, 2020
1c9b4c2
Added underscore to helper function to label as private.
aijams Dec 15, 2020
cb5ed5e
Black formatting changes for whats-new doc file.
aijams Dec 15, 2020
77020c0
Removed imports in docstring that were automatically added by code st…
aijams Dec 17, 2020
6af896b
Merge branch 'master' into aijams/combine-by-coords
aijams Dec 17, 2020
f06371a
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams Dec 19, 2020
7cdeabb
Merge branch 'aijams/combine-by-coords' of https://github.com/aijams/…
aijams Dec 19, 2020
6190839
Removed duplicate new item line in whats-new.
aijams Dec 19, 2020
3055000
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams Jan 30, 2021
cbc002f
combine methods now accept unnamed DataArrays as input.
aijams Apr 15, 2021
11a868b
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams Apr 15, 2021
89ac962
combine nested test checks nested lists of unnamed DataArrays.
aijams Apr 15, 2021
5f3afa5
Made combine_by_coords more readable.
aijams Apr 15, 2021
feb90ce
Cosmetic changes to code style.
aijams Apr 15, 2021
db5b906
Merging changes from first PR.
aijams Apr 18, 2021
e884f52
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams Apr 18, 2021
0044bb9
Removed extra test from merge with previous PR.
aijams Apr 18, 2021
44548ee
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams May 4, 2021
5fe8323
Updated test to use pytest.raises instead of raises_regex.
aijams May 4, 2021
55f53b9
Merged latests changes from upstream.
aijams May 10, 2021
805145c
Added breaking-change entry to whats new page.
aijams May 11, 2021
3eed47a
Merged new changes from master branch.
aijams May 11, 2021
05faa88
Added deprecation warning to combine_coords
aijams May 11, 2021
6c75525
Removed index monotonicity checking temporarily.
aijams May 11, 2021
2c43030
Removed duplicate entries from whats new page.
aijams May 12, 2021
f6fae25
Removed TODO message
aijams May 12, 2021
81ec1ff
Added test for combine_nested.
aijams May 16, 2021
caaee74
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams May 16, 2021
637d4cc
Added check to combine methods to clarify parameter requirements.
aijams May 16, 2021
b5940a1
Reassigned description of changes to bug fixes category.
aijams May 19, 2021
d02da23
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams May 19, 2021
04cd5f8
Minor style changes.
aijams May 19, 2021
e58a9e2
Added blank line for style purposes.
aijams May 19, 2021
c0fc4f1
Merge remote-tracking branch 'upstream/master' into aijams/combine-by…
aijams Jun 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 15 additions & 8 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,16 +21,11 @@ v0.18.2 (unreleased)

New Features
~~~~~~~~~~~~
- :py:func:`combine_by_coords` now accepts a list of unnamed ``DataArray`` as input (:issue:`3248`, :pull:`4696`).
By `Augustus Ijams <https://github.com/aijams>`_.


Breaking changes
~~~~~~~~~~~~~~~~
- The main parameter to :py:func:`combine_by_coords` is renamed to `data_objects` instead
of `datasets` so anyone calling this method using a named parameter will need to update
the name accordingly (:issue:`3248`, :pull:`4696`).
By `Augustus Ijams <https://github.com/aijams>`_.


Deprecations
~~~~~~~~~~~~
Expand Down Expand Up @@ -65,7 +60,6 @@ Thomas Nicholas, Tom Nicholas, Zachary Moon.

New Features
~~~~~~~~~~~~

- Implement :py:meth:`DataArray.drop_duplicates`
to remove duplicate dimension values (:pull:`5239`).
By `Andrew Huang <https://github.com/ahuang11>`_.
Expand All @@ -78,9 +72,22 @@ New Features
- Raise more informative error when decoding time variables with invalid reference dates.
(:issue:`5199`, :pull:`5288`). By `Giacomo Caria <https://github.com/gcaria>`_.

Breaking changes
~~~~~~~~~~~~~~~~
- The main parameter to :py:func:`combine_by_coords` is renamed to `data_objects` instead
of `datasets` so anyone calling this method using a named parameter will need to update
the name accordingly (:issue:`3248`, :pull:`4696`).
By `Augustus Ijams <https://github.com/aijams>`_.

Deprecations
~~~~~~~~~~~~


Bug fixes
~~~~~~~~~

- :py:func:`combine_by_coords` can now handle combining a list of unnamed
``DataArray`` as input (:issue:`3248`, :pull:`4696`).
By `Augustus Ijams <https://github.com/aijams>`_.
- Opening netCDF files from a path that doesn't end in ``.nc`` without supplying
an explicit ``engine`` works again (:issue:`5295`), fixing a bug introduced in
0.18.0.
Expand Down
19 changes: 12 additions & 7 deletions xarray/core/combine.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import itertools
import warnings
from collections import Counter

import pandas as pd
Expand All @@ -10,6 +11,7 @@
from .merge import merge
from .utils import iterate_nested


def _infer_concat_order_from_positions(datasets):
return dict(_infer_tile_ids_from_nested_list(datasets, ()))

Expand Down Expand Up @@ -310,6 +312,7 @@ def _new_tile_id(single_id_ds_pair):
tile_id, ds = single_id_ds_pair
return tile_id[1:]


def _nested_combine(
datasets,
concat_dims,
Expand Down Expand Up @@ -351,6 +354,7 @@ def _nested_combine(
)
return combined


def combine_nested(
datasets,
concat_dim,
Expand Down Expand Up @@ -541,7 +545,8 @@ def combine_nested(
mixed_datasets_and_arrays = any(
isinstance(obj, Dataset) for obj in iterate_nested(datasets)
) and any(
isinstance(obj, DataArray) and obj.name is None for obj in iterate_nested(datasets)
isinstance(obj, DataArray) and obj.name is None
for obj in iterate_nested(datasets)
)
if mixed_datasets_and_arrays:
raise ValueError("Can't combine datasets with unnamed arrays.")
Expand Down Expand Up @@ -626,15 +631,16 @@ def _combine_single_variable_hypercube(
return concatenated


# TODO remove empty list default param after version 0.19, see PR4696
def combine_by_coords(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also modify combine_nested so the two are consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a test test_combine_nested_unnamed_data_arrays that passes a list of unnamed DataArrays into combine_nested and it produces the expected output. Can you clarify what about combine_nested you want to be consistent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what @dcherian meant - at first glance it looks like the _combine_single_variable_hypercube refactoring is a change that would also simplify the code in combine_nested. But looking more closely I don't think it actually makes sense to do that, does it? It seems about as neat as it can be as is.

data_objects,
data_objects=[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data_objects=[],
data_objects,

(It's considered bad practice to have mutable default arguments to functions in python.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this in because if someone calls this method with datasets as a named parameter the data_objects argument would be unspecified and their code would break with an unspecified argument error. This is part of the deprecation warning below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point, but that means the default argument should be None, not an empty list, as None is immutable.

compat="no_conflicts",
data_vars="all",
coords="different",
fill_value=dtypes.NA,
join="outer",
combine_attrs="no_conflicts",
datasets=None
datasets=None,
):
"""
Attempt to auto-magically combine the given datasets (or data arrays)
Expand Down Expand Up @@ -840,7 +846,8 @@ def combine_by_coords(
if datasets is not None:
warnings.warn(
"The datasets argument has been renamed to `data_objects`."
" In future passing a value for datasets will raise an error.")
" In future passing a value for datasets will raise an error."
)
data_objects = datasets

if not data_objects:
Expand All @@ -849,9 +856,7 @@ def combine_by_coords(
mixed_arrays_and_datasets = any(
isinstance(data_object, DataArray) and data_object.name is None
for data_object in data_objects
) and any(
isinstance(data_object, Dataset) for data_object in data_objects
)
) and any(isinstance(data_object, Dataset) for data_object in data_objects)
if mixed_arrays_and_datasets:
raise ValueError("Can't automatically combine datasets with unnamed arrays.")

Expand Down
1 change: 1 addition & 0 deletions xarray/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -933,6 +933,7 @@ class Default(Enum):

_default = Default.token


def iterate_nested(nested_list):
for item in nested_list:
if isinstance(item, list):
Expand Down
12 changes: 6 additions & 6 deletions xarray/tests/test_combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -680,11 +680,11 @@ def test_nested_combine_mixed_datasets_arrays(self):
Dataset({"x": [2, 3]}),
]
with pytest.raises(
ValueError,
match=r"Can't combine datasets with unnamed arrays."
):
ValueError, match=r"Can't combine datasets with unnamed arrays."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ValueError, match=r"Can't combine datasets with unnamed arrays."
ValueError, match=r"Can't combine datasets with unnamed dataarrays."

Tiny clarification that this means datasets with other xarray.datarrays, not something about the numpy arrays inside the xarray.dataset objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good clarification.

):
combine_nested(objs, "x")


class TestCombineAuto:
def test_combine_by_coords(self):
objs = [Dataset({"x": [0]}), Dataset({"x": [1]})]
Expand Down Expand Up @@ -733,9 +733,9 @@ def test_combine_coords_mixed_datasets_arrays(self):
Dataset({"x": [2, 3]}),
]
with pytest.raises(
ValueError,
match=r"Can't automatically combine datasets with unnamed arrays."
):
ValueError,
match=r"Can't automatically combine datasets with unnamed arrays.",
):
combine_by_coords(objs)

@pytest.mark.parametrize(
Expand Down
7 changes: 4 additions & 3 deletions xarray/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@ def test_infix_dims_errors(supplied, all_):
with pytest.raises(ValueError):
list(utils.infix_dims(supplied, all_))


@pytest.mark.parametrize(
"nested_list, expected",
[
Expand All @@ -330,8 +331,8 @@ def test_infix_dims_errors(supplied, all_):
([1, 2, 3], [1, 2, 3]),
([[1]], [1]),
([[1, 2], [3, 4]], [1, 2, 3, 4]),
([[[1, 2, 3], [4]], [5, 6]], [1, 2, 3, 4, 5, 6])
]
([[[1, 2, 3], [4]], [5, 6]], [1, 2, 3, 4, 5, 6]),
],
)
def test_iterate_nested(nested_list, expected):
assert list(iterate_nested(nested_list)) == expected
assert list(iterate_nested(nested_list)) == expected