-
Notifications
You must be signed in to change notification settings - Fork 117
Page on storage formats #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
🎊 PR Preview 1868ad2 has been successfully built and deployed to https://xarray-contrib-xarray-tutorial-preview-pr-325.surge.sh 🕐 Build time: 0.011s 🤖 By surge-preview |
@@ -0,0 +1,191 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a read-through of the new HDF5 / NetCDF-4 section and a bit of updated text for HDF5 if this feels helpful ... (see block below)...
`suggestion ## HDF5 HDF5 (Hierarchical Data Format, version 5) is a **general-purpose container** for large, heterogeneous, hierarchical data. It includes these core components: * **Groups** *Nodes* in a directed graph that starts at the root
/
. They behave like folders in a UNIX filesystem (absolute paths,/sub/group/dataset
), and *may* form cycles or self-links—although most scientific tools avoid that complexity. * **Datasets** Rectangular N-dimensional arrays stored inside groups. Each dimension can optionally carry a **dimension scale**, an auxiliary dataset that describes the coordinate values along that axis. * **Attributes** Small pieces of metadata (strings, scalars, short arrays) attached to the file, any group, or any dataset. * **Storage features** Chunking, compression, checksums, parallel I/O via MPI-IO, and more. These are orthogonal to the logical data model. ## NetCDF4`
...
Reply via ReviewNB
@@ -0,0 +1,191 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is at least parts of TIFF/GeoTiff to use:
I think we can add something on raserio/GDAL here though that I did not add...
https://docs.xarray.dev/en/stable/user-guide/io.html#rasterio
---
## TIFF & GeoTIFF TIFF (Tag Image File Format) is a *flexible* raster container widely used in remote sensing and GIS. A **GeoTIFF** is simply a TIFF that stores additional additional georeferencing information tags (CRS, affine transform, etc.) so software knows where each pixel sits on Earth. ### Core ideas * **Images (“IFDs”)** – each “page” in a TIFF holds a 2-D array of pixels. Multi-band rasters (e.g. RGB, multi-spectral) appear as *separate* IFDs or as extra samples within one IFD. * **Tags** – key–value metadata pairs (datatype, compression, nodata value, CRS, resolution, etc.). GeoTIFF adds standardised tags like ModelPixelScaleTag, ModelTiepointTag, GeoKeyDirectoryTag. * **Compression / tiling** – DEFLATE, LZW, etc. Tiling lets software fetch small windows efficiently. ### Practical notes for xarray users * **Read** – use rioxarray.open_rasterio() (wraps rasterio) to get an immediate, Dask-chunked DataArray. * **Write** – DataArray.rio.to_raster("out.tif"); choose compression + tiling via driver_kwargs. * **Dimensionality** – TIFF is inherently 2-D per band; no native time or vertical axis. If you need 4-D data, NetCDF or Zarr is usually a better fit. * **Metadata depth** – single-level tags only (no nested groups). For rich hierarchies, stick to HDF5 / NetCDF-4. * **Cloud-optimized GeoTIFF (COG)** – same format, arranged so HTTP range requests can stream windows efficiently; xarray handles it transparently when rasterio is compiled with libcurl.
Reply via ReviewNB
@negin513 thanks for your comments - I think this is ready for review / merge. |
Closes part of #321