For grids that fit into memory, or for datasets that include coordinate bounds, the cell polygons can be inferred:
source_geometries = grid_indexing.infer_cell_geometries(ds)
If the geometries fit comfortably into memory (mostly for small grids), we can use the in-memory implementation:
# geoarrow does not support multiple dimensions, so we need to also pass along the shape
source_shape = ...
index = grid_indexing.RTree(source_geometries, source_shape)
overlapping_cells = index.query_overlap(target_geometries, target_shape)
The result is a sparse boolean matrix with the same shape as the source / target polygons combined (dimension order is (target_dim1, ..., source_dim1, ...)
).
The distributed index allows searching for overlapping cells, even when the grids are larger than memory. It is currently built using dask
.
The procedure is almost the same:
# dask.array objects containing shapely polygons
chunked_source_geoms = ...
chunked_target_geoms = ...
index = grid_indexing.distributed.DistributedRTree(chunked_source_geoms)
overlapping_cells = index.query_overlap(chunked_target_geoms)
Note that this will compute both source and target geometries to determine chunk boundaries. overlapping_cells
, however, is truly lazy.