Skip to content

Backwards compatibility reading "old" consolidated dataset without attributtes #2694

@mannreis

Description

@mannreis

Zarr version

3.0.0

Numcodecs version

0.14.1

Python Version

3.12.8

Operating System

Linux - Ubunty

Installation

pip into virtual environment

Description

Hello,

I bumped into a misleading error when reading a simple consolidated dataset (zarr_format=2) with the zarr 3 implementation.

Traceback (most recent call last):
  File "/home/reis/debug-zarr3.py", line 3, in <module>
    zarr.open('/home/reis/test.zarr',zarr_format=2, mode='r', use_consolidated=True)
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/_compat.py", line 43, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/synchronous.py", line 190, in open
    obj = sync(
          ^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/core/sync.py", line 142, in sync
    raise return_result
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/core/sync.py", line 98, in _runner
    return await coro
           ^^^^^^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 332, in open
    return await open_group(store=store_path, zarr_format=zarr_format, mode=mode, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/reis/miniconda3/envs/zarr3/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 832, in open_group
    raise FileNotFoundError(f"Unable to find group: {store_path}")

The problem comes from the fact that when reading a consolidated zarr that was written and consolidated with zarr=2.18.4 the .zmetadata may not contain .zattrs keys but doing the same thing with zarr 3.0.0 (zarr_format=2) creates an empty dict for .zattrs regardless. Which breaks backwards compatibility with older datasets.

I was able to get work around this by avoiding raising exception when .zattrs is not present on the .zmetadata file:

diff --git a/src/zarr/core/group.py b/src/zarr/core/group.py
index b1447a85..2a533272 100644
--- a/src/zarr/core/group.py
+++ b/src/zarr/core/group.py
@@ -574,8 +574,8 @@ class AsyncGroup:
             v2_consolidated_metadata = v2_consolidated_metadata["metadata"]
             # We already read zattrs and zgroup. Should we ignore these?
             print("   DEBUG:", v2_consolidated_metadata)
-            v2_consolidated_metadata.pop(".zattrs")
-            v2_consolidated_metadata.pop(".zgroup")
+            v2_consolidated_metadata.pop(".zattrs", None)
+            v2_consolidated_metadata.pop(".zgroup", None)
 
             consolidated_metadata: defaultdict[str, dict[str, Any]] = defaultdict(dict)

Steps to reproduce

Here's how I produced this sample dataset with python = 3.10, zarr = 2.18.4 as follows:

import zarr
z=zarr.open('/tmp/test.zarr', mode='w')
z.create('myvar',shape=(2,3),dtype='uint8')
zarr.consolidate_metadata(z.store)

But creating the equivalent with python = 3.12.8, zarr = 3.0.0 produces different results:

import zarr
import numcodecs
z=zarr.open('/tmp/test-new.zarr', zarr_format=2,mode='w')
z.create(name='myvar',shape=(2,3),dtype='uint8',compressor=numcodecs.Blosc())
zarr.consolidate_metadata(z.store,zarr_format=2)

Where's the difference between both cases

$ diff <(jq --sort-keys < /tmp/test.zarr/.zmetadata) <(jq --sort-keys < /tmp/test-new.zarr/.zmetadata)
2a3
>     ".zattrs": {},
17a19
>       "dimension_separator": ".",
27c29,30
<     }
---
>     },
>     "myvar/.zattrs": {}

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions