Skip to content

Conversation

cehbrecht
Copy link
Collaborator

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • This PR fixes issue #xyz
  • Tests for the changes have been added (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • HISTORY.rst has been updated (with summary of main changes)

What kind of change does this PR introduce?:

This is a fix for is_time function to avoid memory overload when using coord.values for a data variable.

Does this PR introduce a breaking change?:

Other information:

This issue occurred when testing atlas-v2 data with very large datasets (several GBs):

https://github.com/roocs/rook/blob/dev-atlas-v2/notebooks/atlas-v2.ipynb

@cehbrecht cehbrecht marked this pull request as draft February 26, 2025 16:28
@coveralls
Copy link

coveralls commented Feb 26, 2025

Pull Request Test Coverage Report for Build 13905960780

Details

  • 31 of 34 (91.18%) changed or added relevant lines in 2 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 86.818%

Changes Missing Coverage Covered Lines Changed/Added Lines %
clisops/utils/dataset_utils.py 30 33 90.91%
Files with Coverage Reduction New Missed Lines %
clisops/utils/dataset_utils.py 2 79.41%
Totals Coverage Status
Change from base Build 13901323788: 0.2%
Covered Lines: 3102
Relevant Lines: 3573

💛 - Coveralls

def is_time(coord):
"""
Determines if a coordinate is time.

:param coord: coordinate of xarray dataset e.g. coord = ds.coords[coord_id]
:return: (bool) True if the coordinate is time.
"""
if coord.ndim >= 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it really be False by default for time_bnds, so should time_bnds not be considered as time coordinate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know ... but if I skip the check/filter than I have to deal with it at other places ... see L106.

Copy link
Contributor

@sol1105 sol1105 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In l64 in function get_coord_by_type one could replace
coord_vars = list(ds.coords) + list(ds.data_vars) with coord_vars = (list(ds.coords) + list(ds.data_vars)).remove(get_main_variable(ds)), then is_time should no longer be run for variables that do not fit in memory, as it was before.

return coord_id, [x for x in coords if x != coord_id]
else:
return coord_id
if coord_id in ds.coords:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure coord_id is in ds.coords (lat_bnds is not)

@cehbrecht
Copy link
Collaborator Author

In l64 in function get_coord_by_type one could replace coord_vars = list(ds.coords) + list(ds.data_vars) with coord_vars = (list(ds.coords) + list(ds.data_vars)).remove(get_main_variable(ds)), then is_time should no longer be run for variables that do not fit in memory, as it was before.

I have added this. there was another issue that get_main_variable failed for some regrid tests. See regrid.detect_coordinate L858.

@Zeitsperre Zeitsperre mentioned this pull request Feb 27, 2025
4 tasks
@cehbrecht
Copy link
Collaborator Author

cehbrecht commented Mar 6, 2025

@sol1105 tests are working now. We have updated to the latests xarray ... some regrid tests are failing. I have marked them with xfail. Are you ok with this?

Not sure how we solve this in future. We could patch xarray ...

@Zeitsperre
Copy link
Collaborator

Zeitsperre commented Mar 6, 2025

@cehbrecht I asked my colleague @aulemahal to look into the issue on the xarray side. There might be some movement there soon!

For reference: pydata/xarray#8821

@cehbrecht cehbrecht marked this pull request as ready for review March 11, 2025 10:22
@cehbrecht
Copy link
Collaborator Author

@Zeitsperre @sol1105 good to go?

Copy link
Collaborator

@Zeitsperre Zeitsperre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Small changes.

@Zeitsperre Zeitsperre mentioned this pull request Mar 11, 2025
4 tasks
@Zeitsperre
Copy link
Collaborator

@cehbrecht Don't forget to update the HISTORY.rst file!

@cehbrecht cehbrecht merged commit a044c23 into master Mar 18, 2025
12 checks passed
@cehbrecht cehbrecht deleted the fix-is-time branch March 18, 2025 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants