Skip to content

Page on storage formats #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 8, 2025

Conversation

TomNicholas
Copy link
Member

Closes part of #321

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@TomNicholas TomNicholas mentioned this pull request Jul 5, 2025
41 tasks
Copy link

github-actions bot commented Jul 5, 2025

🎊 PR Preview 1868ad2 has been successfully built and deployed to https://xarray-contrib-xarray-tutorial-preview-pr-325.surge.sh

🕐 Build time: 0.011s

🤖 By surge-preview

@@ -0,0 +1,191 @@
{
Copy link
Contributor

@negin513 negin513 Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a read-through of the new HDF5 / NetCDF-4 section and a bit of updated text for HDF5 if this feels helpful ... (see block below)...

`suggestion
## HDF5

HDF5 (Hierarchical Data Format, version 5) is a **general-purpose container** for large, heterogeneous, hierarchical data.  It includes these core components:

* **Groups**  

 *Nodes* in a directed graph that starts at the root /.  

 They behave like folders in a UNIX filesystem (absolute paths, /sub/group/dataset), and *may* form cycles or self-links—although most scientific tools avoid that complexity.


* **Datasets**  

 Rectangular N-dimensional arrays stored inside groups.  

 Each dimension can optionally carry a **dimension scale**, an auxiliary dataset that describes the coordinate values along that axis.

* **Attributes**  

 Small pieces of metadata (strings, scalars, short arrays) attached to the file, any group, or any dataset.

* **Storage features**  

 Chunking, compression, checksums, parallel I/O via MPI-IO, and more.  

 These are orthogonal to the logical data model.

## NetCDF4

`

...


Reply via ReviewNB

@@ -0,0 +1,191 @@
{
Copy link
Contributor

@negin513 negin513 Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is at least parts of TIFF/GeoTiff to use:

I think we can add something on raserio/GDAL here though that I did not add...

https://docs.xarray.dev/en/stable/user-guide/io.html#rasterio

---

## TIFF & GeoTIFF

TIFF (Tag Image File Format) is a *flexible* raster container widely used in remote sensing and GIS.  

A **GeoTIFF** is simply a TIFF that stores additional additional georeferencing information  tags (CRS, affine transform, etc.) so software knows where each pixel sits on Earth. 


### Core ideas

* **Images (“IFDs”)** – each “page” in a TIFF holds a 2-D array of pixels.  

 Multi-band rasters (e.g. RGB, multi-spectral) appear as *separate* IFDs or as extra samples within one IFD.


* **Tags** – key–value metadata pairs (datatype, compression, nodata value, CRS, resolution, etc.).  
GeoTIFF adds standardised tags like ModelPixelScaleTag, ModelTiepointTag, GeoKeyDirectoryTag.

* **Compression / tiling** – DEFLATE, LZW, etc. Tiling lets software fetch small windows efficiently.

### Practical notes for xarray users

* **Read** – use rioxarray.open_rasterio() (wraps rasterio) to get an immediate, Dask-chunked DataArray.

* **Write** – DataArray.rio.to_raster("out.tif"); choose compression + tiling via driver_kwargs.

* **Dimensionality** – TIFF is inherently 2-D per band; no native time or vertical axis. If you need 4-D data, NetCDF or Zarr is usually a better fit.

* **Metadata depth** – single-level tags only (no nested groups). For rich hierarchies, stick to HDF5 / NetCDF-4.

* **Cloud-optimized GeoTIFF (COG)** – same format, arranged so HTTP range requests can stream windows efficiently; xarray handles it transparently when rasterio is compiled with libcurl.


Reply via ReviewNB

@scottyhq scottyhq mentioned this pull request Jul 7, 2025
@TomNicholas TomNicholas marked this pull request as ready for review July 8, 2025 17:56
@TomNicholas
Copy link
Member Author

@negin513 thanks for your comments - I think this is ready for review / merge.

@TomNicholas TomNicholas merged commit 2f86b37 into xarray-contrib:main Jul 8, 2025
1 of 3 checks passed
@TomNicholas TomNicholas deleted the storage_formats branch July 8, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants