-
-
Notifications
You must be signed in to change notification settings - Fork 356
Description
Zarr version
3.0.0
Numcodecs version
0.14.1
Python Version
3.12.8
Operating System
Linux - Ubunty
Installation
pip into virtual environment
Description
Hello,
I bumped into a misleading error when reading a simple consolidated dataset (zarr_format=2
) with the zarr 3 implementation.
Traceback (most recent call last):
File "/home/reis/debug-zarr3.py", line 3, in <module>
zarr.open('/home/reis/test.zarr',zarr_format=2, mode='r', use_consolidated=True)
File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/_compat.py", line 43, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/synchronous.py", line 190, in open
obj = sync(
^^^^^
File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/core/sync.py", line 142, in sync
raise return_result
File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/core/sync.py", line 98, in _runner
return await coro
^^^^^^^^^^
File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 332, in open
return await open_group(store=store_path, zarr_format=zarr_format, mode=mode, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 832, in open_group
raise FileNotFoundError(f"Unable to find group: {store_path}")
The problem comes from the fact that when reading a consolidated zarr that was written and consolidated with zarr=2.18.4
the .zmetadata
may not contain .zattrs
keys but doing the same thing with zarr 3.0.0
(zarr_format=2
) creates an empty dict for .zattrs
regardless. Which breaks backwards compatibility with older datasets.
I was able to get work around this by avoiding raising exception when .zattrs
is not present on the .zmetadata
file:
diff --git a/src/zarr/core/group.py b/src/zarr/core/group.py
index b1447a85..2a533272 100644
--- a/src/zarr/core/group.py
+++ b/src/zarr/core/group.py
@@ -574,8 +574,8 @@ class AsyncGroup:
v2_consolidated_metadata = v2_consolidated_metadata["metadata"]
# We already read zattrs and zgroup. Should we ignore these?
print(" DEBUG:", v2_consolidated_metadata)
- v2_consolidated_metadata.pop(".zattrs")
- v2_consolidated_metadata.pop(".zgroup")
+ v2_consolidated_metadata.pop(".zattrs", None)
+ v2_consolidated_metadata.pop(".zgroup", None)
consolidated_metadata: defaultdict[str, dict[str, Any]] = defaultdict(dict)
Steps to reproduce
Here's how I produced this sample dataset with python = 3.10
, zarr = 2.18.4
as follows:
import zarr
z=zarr.open('/tmp/test.zarr', mode='w')
z.create('myvar',shape=(2,3),dtype='uint8')
zarr.consolidate_metadata(z.store)
But creating the equivalent with python = 3.12.8
, zarr = 3.0.0
produces different results:
import zarr
import numcodecs
z=zarr.open('/tmp/test-new.zarr', zarr_format=2,mode='w')
z.create(name='myvar',shape=(2,3),dtype='uint8',compressor=numcodecs.Blosc())
zarr.consolidate_metadata(z.store,zarr_format=2)
Where's the difference between both cases
$ diff <(jq --sort-keys < /tmp/test.zarr/.zmetadata) <(jq --sort-keys < /tmp/test-new.zarr/.zmetadata)
2a3
> ".zattrs": {},
17a19
> "dimension_separator": ".",
27c29,30
< }
---
> },
> "myvar/.zattrs": {}
Additional output
No response