-
Notifications
You must be signed in to change notification settings - Fork 23
Closed
Labels
bugSomething isn't workingSomething isn't workingperformanceRelating to speed and memory performanceRelating to speed and memory performanceregriddingRelating to regridding operationsRelating to regridding operations
Milestone
Description
Since the advent of Dask at v3.14.0, regridding sometimes fails due to running out of memory.
This happens because the sparse weight matrix is converted to dense form, which can be huge. E.g. when regridding a 400x400 grid to a 300x300 grid, the dense weights matrix has shape (90000, 160000)
, taking up 300*300*400*400*8 bytes = 107.3 GiB
. That is enough to kill any process on my laptop, at least!
The attached PR reformulated the weights creation so that it stays in sparse form. In the above example, the sparse form has size (assuming 4 destination grid weights per source grid point, as is the case for, e.g., linear regridding) 300*300*4*8 bytes = 0.0027 GiB
- essentially negligible!
>>> import cf
>>> cf.environment(paths=False)
Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
HDF5 library: 1.12.1
netcdf library: 4.8.1
udunits2 library: /home/david/miniconda3/lib/libudunits2.so.0
ESMF: 8.2.0
Python: 3.10.9
dask: 2023.3.0
netCDF4: 1.6.0
psutil: 5.9.4
packaging: 23.0
numpy: 1.22.3
scipy: 1.8.1
matplotlib: 3.4.3
cftime: 1.6.2
cfunits: 3.3.5
cfplot: 3.1.31
cfdm: 1.10.0.3
cf: 3.14.1
bewithankit
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingperformanceRelating to speed and memory performanceRelating to speed and memory performanceregriddingRelating to regridding operationsRelating to regridding operations