Skip to content

Commit 1868ad2

Browse files
committed
update zarr section to match
1 parent 3ac66d0 commit 1868ad2

File tree

1 file changed

+35
-29
lines changed

1 file changed

+35
-29
lines changed

intermediate/storage_formats.ipynb

Lines changed: 35 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,13 @@
5555
"cell_type": "markdown",
5656
"metadata": {},
5757
"source": [
58-
"- Tree of arbitrary groups\n",
59-
"- Each holds arbitrary data in the form of arrays + metadata\n",
60-
"- No relationship enforced between groups\n",
61-
"- No relationship enforced between arrays within a group\n",
62-
"- No concept of \"coordinates\" vs \"data\"\n",
63-
"- No references from one group to another"
58+
"* **Tree of groups** – Tree of arbitrary groups.\n",
59+
"\n",
60+
"* **Separate groups** – No relationship enforced between groups, and no references from one group to another.\n",
61+
"\n",
62+
"* **Separate arrays** – No relationship enforced between arrays within a group.\n",
63+
"\n",
64+
"* **Arbitrary JSON metadata** – Each holds arbitrary data in the form of arrays + metadata."
6465
]
6566
},
6667
{
@@ -79,17 +80,21 @@
7980
"cell_type": "markdown",
8081
"metadata": {},
8182
"source": [
82-
"How does zarr relate to `xarray`?\n",
83+
"### How does zarr relate to `xarray`?\n",
84+
"\n",
85+
"* **Arrays <-> `Variables`** - zarr arrays map well to `xarray.Variables`\n",
86+
" - Especially as zarr v3 includes (optional) `dimension_names`\n",
87+
"\n",
88+
"* **Groups <-> `Datasets`** - zarr groups map reasonably well to `xarray.Dataset` objects\n",
89+
" - Open a single zarr group in xarray via `xr.open_dataset(store, group='/path', engine='zarr')`\n",
8390
"\n",
84-
"- zarr arrays map well to `xarray.Variables`\n",
85-
" - especially because zarr v3 includes (optional) `dimension_names`\n",
86-
"- zarr groups map reasonably well to `xarray.Dataset` objects\n",
87-
" - `xr.open_dataset(store, group='/path', engine='zarr')`\n",
88-
" - but `xarray.Dataset`s require that all arrays in the Dataset have aligned dimensions\n",
89-
" - so it is possible to create a zarr group that is not a valid `xarray.Dataset`, if the group contains arrays with non-aligning dimensions\n",
90-
" - Also zarr has no concept of \"coordinate\" vs \"data\" variables\n",
91-
" - so xarray has to save this piece of information as an additional piece of metadata \n",
92-
"- zarr store has a tree of groups\n",
91+
"* **Groups must be alignable** - But `xarray.Dataset`s require that all arrays in the Dataset have aligned dimensions\n",
92+
" - so it is possible to create a zarr group that is not a valid `xarray.Dataset`, if the group contains arrays with non-aligning dimensions\n",
93+
"\n",
94+
"* **No \"coordinates\"** – No arrays are special, so Zarr has no intrinsic concept of \"coordinate\" vs \"data\" variables.\n",
95+
" - So xarray has to save this piece of information as an additional piece of zarr metadata.\n",
96+
"\n",
97+
"* **Tree of groups <-> `DataTree`** - zarr store has a tree of groups\n",
9398
" - maps to either a set of independent `xarray.Datasets`\n",
9499
" - `xr.open_groups(store)`\n",
95100
" - or to a single `xarray.DataTree`\n",
@@ -192,7 +197,7 @@
192197
"cell_type": "markdown",
193198
"metadata": {},
194199
"source": [
195-
"TIFF (Tag Image File Format) is a *flexible* raster container widely used in biosciences, remote sensing and GIS. \n",
200+
"TIFF (Tag Image File Format) is a raster container widely used in biosciences, remote sensing and GIS. \n",
196201
"\n",
197202
"A **GeoTIFF** is simply a TIFF that stores additional additional georeferencing information tags (CRS, affine transform, etc.) so geospatial software knows where each pixel sits on Earth. \n",
198203
"\n",
@@ -206,25 +211,26 @@
206211
"\n",
207212
"* **Compression / tiling** – DEFLATE, LZW, etc. Tiling lets software fetch small windows efficiently.\n",
208213
"\n",
209-
"### Practical notes for xarray users\n",
210-
"\n",
211-
"* **Read** – use `rioxarray.open_rasterio()` (wraps rasterio) to get an immediate, Dask-chunked DataArray.\n",
212-
"\n",
213-
"* **Write** – `DataArray.rio.to_raster(\"out.tif\")`; choose compression + tiling via driver_kwargs.\n",
214-
"\n",
215-
"* **Dimensionality** – TIFF is inherently 2-D per band; no native time or vertical axis. If you need 4-D data, NetCDF or Zarr is usually a better fit.\n",
216-
"\n",
217-
"* **Metadata depth** – single-level tags only (no nested groups). For rich hierarchies, stick to HDF5 / NetCDF-4.\n",
218-
"\n",
219-
"* **Cloud-optimized GeoTIFF (COG)** – same format, arranged so HTTP range requests can stream windows efficiently; xarray handles it transparently when rasterio is compiled with libcurl.\n"
214+
"* **Cloud-optimized GeoTIFF (COG)** – same format, arranged so HTTP range requests can stream windows efficiently; xarray handles it transparently when rasterio is compiled with libcurl."
220215
]
221216
},
222217
{
223218
"cell_type": "markdown",
224219
"metadata": {},
225220
"source": [
226221
"### How does TIFF relate to xarray?\n",
227-
"\n"
222+
"\n",
223+
"* **Dimensionality** – Each raster image maps well to a single `xarray.Variable`, but TIFF is inherently 2-D per band; no native time or vertical axis. If you need 4-D data, NetCDF or Zarr is usually a better fit.\n",
224+
"\n",
225+
"* **No named dimensions** - TIFFs don't have named dimensions for the two axes of the raster.\n",
226+
"\n",
227+
"* **IFDs as groups** - IFDs can be mapped to groups, which may be useful for multi-resolution TIFFs (also known as \"overviews\") and multi-page TIFFs.\n",
228+
"\n",
229+
"* **Metadata depth** – single-level tags only (no nested groups). For rich hierarchies, stick to HDF5 / NetCDF-4.\n",
230+
"\n",
231+
"* **Read** – use `rioxarray.open_rasterio()` (wraps rasterio) to get an immediate, Dask-chunked DataArray. However `rioxarray` is for interacting with GeoTIFFs, not general TIFFs.\n",
232+
"\n",
233+
"* **Write** – `DataArray.rio.to_raster(\"out.tif\")`; choose compression + tiling via driver_kwargs."
228234
]
229235
},
230236
{

0 commit comments

Comments
 (0)