Skip to content

keewis/grid-indexing

Repository files navigation

rust ci python ci PyPI version codestyle

grid-indexing: Fast and scalable indexing of grids

inferring grid geometries

For grids that fit into memory, or for datasets that include coordinate bounds, the cell polygons can be inferred:

source_geometries = grid_indexing.infer_cell_geometries(ds)

in-memory index

If the geometries fit comfortably into memory (mostly for small grids), we can use the in-memory implementation:

# geoarrow does not support multiple dimensions, so we need to also pass along the shape
source_shape = ...
index = grid_indexing.RTree(source_geometries, source_shape)

overlapping_cells = index.query_overlap(target_geometries, target_shape)

The result is a sparse boolean matrix with the same shape as the source / target polygons combined (dimension order is (target_dim1, ..., source_dim1, ...)).

distributed index

The distributed index allows searching for overlapping cells, even when the grids are larger than memory. It is currently built using dask.

The procedure is almost the same:

# dask.array objects containing shapely polygons
chunked_source_geoms = ...
chunked_target_geoms = ...

index = grid_indexing.distributed.DistributedRTree(chunked_source_geoms)
overlapping_cells = index.query_overlap(chunked_target_geoms)

Note that this will compute both source and target geometries to determine chunk boundaries. overlapping_cells, however, is truly lazy.

About

Fast and scalable indexing of grids

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •