Skip to content

Commit f3305d9

Browse files
d-v-bjhamman
andauthored
groundwork for V3 group tests (#1743)
* feat: functional .children method for groups * changes necessary for correctly generating list of children * add stand-alone test for group.children * give type hints a glow-up * test: use separate assert statements to avoid platform-dependent ordering issues * test: put fixtures in conftest, add MemoryStore fixture * docs: release notes * test: remove prematurely-added mock s3 fixture * chore: move v3 tests into v3 folder * chore: type hints * test: add schema for group method tests * chore: add type for zarr_formats * chore: remove localstore for now * test: add __init__.py to support imports from top-level conftest.py, and add some docstrings, and remove redundant def * fix: return valid JSON from GroupMetadata.to_bytes for v2 metadata * fix: don't use a type as a value * test: add getitem test * fix: replace reference to nonexistent method in with , which does exist * test: declare v3ness via directory structure, not test file name * add a docstring to _get, and pass auto_mkdir to _put * fix: add docstring to LocalStore.get_partial_values; adjust body of LocalStore.get_partial_values to properly handle the byte_range parameter of LocalStore.get. * test: add tests for localstore init, set, get, get_partial * fix: Rename children to members; AsyncGroup.members yields tuples of (name, AsyncArray / AsyncGroup) pairs; Group.members repackages these into a dict. * fix: make Group.members return a tuple of str, Array | Group pairs * fix: revert changes to synchronization code; this is churn that we need to deal with * chore: move v3 tests into v3 folder * chore: type hints * test: add schema for group method tests * chore: add type for zarr_formats * chore: remove localstore for now * test: add __init__.py to support imports from top-level conftest.py, and add some docstrings, and remove redundant def * fix: return valid JSON from GroupMetadata.to_bytes for v2 metadata * fix: don't use a type as a value * test: add getitem test * fix: replace reference to nonexistent method in with , which does exist * test: declare v3ness via directory structure, not test file name * add a docstring to _get, and pass auto_mkdir to _put * fix: add docstring to LocalStore.get_partial_values; adjust body of LocalStore.get_partial_values to properly handle the byte_range parameter of LocalStore.get. * test: add tests for localstore init, set, get, get_partial * fix: remove pre-emptive fetching from group.open * fix: use removeprefix (removes a substring) instead of strip (removes any member of a set); comment out / avoid tests that cannot pass right now; don't consider implicit groups for v2; check if prefix is present in storage before opening for Group.getitem * xfail v2 tests that are sure to fail; add delitem tests; partition xfailing tests into subtests * fix: handle byte_range[0] being None * fix: adjust test for localstore.get to check that get on nonexistent keys returns None; correctly create intermediate directories when preparing test data in test_local_store_get_partial * fix: add zarr_format parameter to array creation routines (which raises if zarr_format is not 3), and xfail the tests that will hit this condition. add tests for create_group, create_array, and update_attributes methods of asyncgroup. * test: add group init test * feature(store): make list_* methods async generators (#110) * feature(store): make list_* methods async generators * Update src/zarr/v3/store/memory.py * Apply suggestions from code review - simplify code comments - use `removeprefix` instead of `strip` --------- Co-authored-by: Davis Bennett <[email protected]> * fix: define utility for converting asyncarray to array, and similar for group, largely to appease mypy * chore: remove checks that only existed because of implicit groups * chore: clean up docstring and modernize some type hints * chore: move imports to top-level * remove fixture files * remove commented imports * remove explicit asyncio marks; use __eq__ method of LocalStore for test * rename test_storage to test_store * modern type hints --------- Co-authored-by: Joe Hamman <[email protected]>
1 parent c31a785 commit f3305d9

File tree

13 files changed

+1241
-107
lines changed

13 files changed

+1241
-107
lines changed

src/zarr/abc/store.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from abc import abstractmethod, ABC
2-
32
from collections.abc import AsyncGenerator
3+
44
from typing import List, Tuple, Optional
55

66

src/zarr/array.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
ChunkCoords,
2929
Selection,
3030
SliceSelection,
31+
ZarrFormat,
3132
concurrent_map,
3233
)
3334
from zarr.config import config
@@ -89,6 +90,7 @@ async def create(
8990
dimension_names: Optional[Iterable[str]] = None,
9091
attributes: Optional[Dict[str, Any]] = None,
9192
exists_ok: bool = False,
93+
zarr_format: ZarrFormat = 3,
9294
) -> AsyncArray:
9395
store_path = make_store_path(store)
9496
if not exists_ok:

src/zarr/common.py

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@
55
Union,
66
Tuple,
77
Iterable,
8-
Dict,
9-
List,
108
TypeVar,
119
overload,
1210
Any,
@@ -18,7 +16,7 @@
1816
import functools
1917

2018
if TYPE_CHECKING:
21-
from typing import Any, Awaitable, Callable, Iterator, Optional, Type
19+
from typing import Awaitable, Callable, Iterator, Optional, Type
2220

2321
import numpy as np
2422

@@ -27,25 +25,26 @@
2725
ZGROUP_JSON = ".zgroup"
2826
ZATTRS_JSON = ".zattrs"
2927

30-
BytesLike = Union[bytes, bytearray, memoryview]
31-
ChunkCoords = Tuple[int, ...]
28+
BytesLike = bytes | bytearray | memoryview
29+
ChunkCoords = tuple[int, ...]
3230
ChunkCoordsLike = Iterable[int]
33-
SliceSelection = Tuple[slice, ...]
34-
Selection = Union[slice, SliceSelection]
35-
JSON = Union[str, None, int, float, Enum, Dict[str, "JSON"], List["JSON"], Tuple["JSON", ...]]
31+
SliceSelection = tuple[slice, ...]
32+
Selection = slice | SliceSelection
33+
ZarrFormat = Literal[2, 3]
34+
JSON = Union[str, None, int, float, Enum, dict[str, "JSON"], list["JSON"], tuple["JSON", ...]]
3635

3736

3837
def product(tup: ChunkCoords) -> int:
3938
return functools.reduce(lambda x, y: x * y, tup, 1)
4039

4140

42-
T = TypeVar("T", bound=Tuple[Any, ...])
41+
T = TypeVar("T", bound=tuple[Any, ...])
4342
V = TypeVar("V")
4443

4544

4645
async def concurrent_map(
47-
items: List[T], func: Callable[..., Awaitable[V]], limit: Optional[int] = None
48-
) -> List[V]:
46+
items: list[T], func: Callable[..., Awaitable[V]], limit: Optional[int] = None
47+
) -> list[V]:
4948
if limit is None:
5049
return await asyncio.gather(*[func(*item) for item in items])
5150

@@ -127,18 +126,18 @@ def parse_configuration(data: JSON) -> JSON:
127126
@overload
128127
def parse_named_configuration(
129128
data: JSON, expected_name: Optional[str] = None
130-
) -> Tuple[str, Dict[str, JSON]]: ...
129+
) -> tuple[str, dict[str, JSON]]: ...
131130

132131

133132
@overload
134133
def parse_named_configuration(
135134
data: JSON, expected_name: Optional[str] = None, *, require_configuration: bool = True
136-
) -> Tuple[str, Optional[Dict[str, JSON]]]: ...
135+
) -> tuple[str, Optional[dict[str, JSON]]]: ...
137136

138137

139138
def parse_named_configuration(
140139
data: JSON, expected_name: Optional[str] = None, *, require_configuration: bool = True
141-
) -> Tuple[str, Optional[JSON]]:
140+
) -> tuple[str, Optional[JSON]]:
142141
if not isinstance(data, dict):
143142
raise TypeError(f"Expected dict, got {type(data)}")
144143
if "name" not in data:
@@ -153,7 +152,7 @@ def parse_named_configuration(
153152
return name_parsed, configuration_parsed
154153

155154

156-
def parse_shapelike(data: Any) -> Tuple[int, ...]:
155+
def parse_shapelike(data: Any) -> tuple[int, ...]:
157156
if not isinstance(data, Iterable):
158157
raise TypeError(f"Expected an iterable. Got {data} instead.")
159158
data_tuple = tuple(data)

src/zarr/group.py

Lines changed: 65 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,19 @@
55
import asyncio
66
import json
77
import logging
8+
import numpy.typing as npt
89

910
if TYPE_CHECKING:
10-
from typing import (
11-
Any,
12-
AsyncGenerator,
13-
Literal,
14-
AsyncIterator,
15-
)
11+
from typing import Any, AsyncGenerator, Literal, Iterable
12+
from zarr.abc.codec import Codec
1613
from zarr.abc.metadata import Metadata
1714

1815
from zarr.array import AsyncArray, Array
1916
from zarr.attributes import Attributes
20-
from zarr.common import ZARR_JSON, ZARRAY_JSON, ZATTRS_JSON, ZGROUP_JSON
17+
from zarr.common import ZARR_JSON, ZARRAY_JSON, ZATTRS_JSON, ZGROUP_JSON, ChunkCoords
2118
from zarr.store import StoreLike, StorePath, make_store_path
2219
from zarr.sync import SyncMixin, sync
20+
from typing import overload
2321

2422
logger = logging.getLogger("zarr.group")
2523

@@ -41,6 +39,26 @@ def parse_attributes(data: Any) -> dict[str, Any]:
4139
raise TypeError(msg)
4240

4341

42+
@overload
43+
def _parse_async_node(node: AsyncArray) -> Array: ...
44+
45+
46+
@overload
47+
def _parse_async_node(node: AsyncGroup) -> Group: ...
48+
49+
50+
def _parse_async_node(node: AsyncArray | AsyncGroup) -> Array | Group:
51+
"""
52+
Wrap an AsyncArray in an Array, or an AsyncGroup in a Group.
53+
"""
54+
if isinstance(node, AsyncArray):
55+
return Array(node)
56+
elif isinstance(node, AsyncGroup):
57+
return Group(node)
58+
else:
59+
assert False
60+
61+
4462
@dataclass(frozen=True)
4563
class GroupMetadata(Metadata):
4664
attributes: dict[str, Any] = field(default_factory=dict)
@@ -53,7 +71,7 @@ def to_bytes(self) -> dict[str, bytes]:
5371
return {ZARR_JSON: json.dumps(self.to_dict()).encode()}
5472
else:
5573
return {
56-
ZGROUP_JSON: json.dumps({"zarr_format": 2}).encode(),
74+
ZGROUP_JSON: json.dumps({"zarr_format": self.zarr_format}).encode(),
5775
ZATTRS_JSON: json.dumps(self.attributes).encode(),
5876
}
5977

@@ -113,11 +131,11 @@ async def open(
113131
(store_path / ZGROUP_JSON).get(), (store_path / ZATTRS_JSON).get()
114132
)
115133
if zgroup_bytes is None:
116-
raise KeyError(store_path) # filenotfounderror?
134+
raise FileNotFoundError(store_path)
117135
elif zarr_format == 3:
118136
zarr_json_bytes = await (store_path / ZARR_JSON).get()
119137
if zarr_json_bytes is None:
120-
raise KeyError(store_path) # filenotfounderror?
138+
raise FileNotFoundError(store_path)
121139
elif zarr_format is None:
122140
zarr_json_bytes, zgroup_bytes, zattrs_bytes = await asyncio.gather(
123141
(store_path / ZARR_JSON).get(),
@@ -168,17 +186,14 @@ async def getitem(
168186
key: str,
169187
) -> AsyncArray | AsyncGroup:
170188
store_path = self.store_path / key
189+
logger.warning("key=%s, store_path=%s", key, store_path)
171190

172191
# Note:
173192
# in zarr-python v2, we first check if `key` references an Array, else if `key` references
174193
# a group,using standalone `contains_array` and `contains_group` functions. These functions
175194
# are reusable, but for v3 they would perform redundant I/O operations.
176195
# Not clear how much of that strategy we want to keep here.
177196

178-
# if `key` names an object in storage, it cannot be an array or group
179-
if await store_path.exists():
180-
raise KeyError(key)
181-
182197
if self.metadata.zarr_format == 3:
183198
zarr_json_bytes = await (store_path / ZARR_JSON).get()
184199
if zarr_json_bytes is None:
@@ -248,16 +263,42 @@ def attrs(self):
248263
def info(self):
249264
return self.metadata.info
250265

251-
async def create_group(self, path: str, **kwargs) -> AsyncGroup:
266+
async def create_group(
267+
self, path: str, exists_ok: bool = False, attributes: dict[str, Any] = {}
268+
) -> AsyncGroup:
252269
return await type(self).create(
253270
self.store_path / path,
254-
**kwargs,
271+
attributes=attributes,
272+
exists_ok=exists_ok,
273+
zarr_format=self.metadata.zarr_format,
255274
)
256275

257-
async def create_array(self, path: str, **kwargs) -> AsyncArray:
276+
async def create_array(
277+
self,
278+
path: str,
279+
shape: ChunkCoords,
280+
dtype: npt.DTypeLike,
281+
chunk_shape: ChunkCoords,
282+
fill_value: Any | None = None,
283+
chunk_key_encoding: tuple[Literal["default"], Literal[".", "/"]]
284+
| tuple[Literal["v2"], Literal[".", "/"]] = ("default", "/"),
285+
codecs: Iterable[Codec | dict[str, Any]] | None = None,
286+
dimension_names: Iterable[str] | None = None,
287+
attributes: dict[str, Any] | None = None,
288+
exists_ok: bool = False,
289+
) -> AsyncArray:
258290
return await AsyncArray.create(
259291
self.store_path / path,
260-
**kwargs,
292+
shape=shape,
293+
dtype=dtype,
294+
chunk_shape=chunk_shape,
295+
fill_value=fill_value,
296+
chunk_key_encoding=chunk_key_encoding,
297+
codecs=codecs,
298+
dimension_names=dimension_names,
299+
attributes=attributes,
300+
exists_ok=exists_ok,
301+
zarr_format=self.metadata.zarr_format,
261302
)
262303

263304
async def update_attributes(self, new_attributes: dict[str, Any]):
@@ -348,7 +389,7 @@ async def array_keys(self) -> AsyncGenerator[str, None]:
348389
yield key
349390

350391
# todo: decide if this method should be separate from `array_keys`
351-
async def arrays(self) -> AsyncIterator[AsyncArray]:
392+
async def arrays(self) -> AsyncGenerator[AsyncArray, None]:
352393
async for key, value in self.members():
353394
if isinstance(value, AsyncArray):
354395
yield value
@@ -472,19 +513,13 @@ def nmembers(self) -> int:
472513
@property
473514
def members(self) -> tuple[tuple[str, Array | Group], ...]:
474515
"""
475-
Return the sub-arrays and sub-groups of this group as a `tuple` of (name, array | group)
516+
Return the sub-arrays and sub-groups of this group as a tuple of (name, array | group)
476517
pairs
477518
"""
478-
_members: list[tuple[str, AsyncArray | AsyncGroup]] = self._sync_iter(
479-
self._async_group.members()
480-
)
481-
ret: list[tuple[str, Array | Group]] = []
482-
for key, value in _members:
483-
if isinstance(value, AsyncArray):
484-
ret.append((key, Array(value)))
485-
else:
486-
ret.append((key, Group(value)))
487-
return tuple(ret)
519+
_members = self._sync_iter(self._async_group.members())
520+
521+
result = tuple(map(lambda kv: (kv[0], _parse_async_node(kv[1])), _members))
522+
return result
488523

489524
def __contains__(self, member) -> bool:
490525
return self._sync(self._async_group.contains(member))

src/zarr/store/local.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,20 @@
44
import shutil
55
from collections.abc import AsyncGenerator
66
from pathlib import Path
7-
from typing import Union, Optional, List, Tuple
87

98
from zarr.abc.store import Store
109
from zarr.common import BytesLike, concurrent_map, to_thread
1110

1211

13-
def _get(path: Path, byte_range: Optional[Tuple[int, Optional[int]]] = None) -> bytes:
12+
def _get(path: Path, byte_range: tuple[int, int | None] | None) -> bytes:
1413
"""
1514
Fetch a contiguous region of bytes from a file.
15+
1616
Parameters
1717
----------
1818
path: Path
1919
The file to read bytes from.
20-
byte_range: Optional[Tuple[int, Optional[int]]] = None
20+
byte_range: tuple[int, int | None] | None = None
2121
The range of bytes to read. If `byte_range` is `None`, then the entire file will be read.
2222
If `byte_range` is a tuple, the first value specifies the index of the first byte to read,
2323
and the second value specifies the total number of bytes to read. If the total value is
@@ -49,7 +49,7 @@ def _get(path: Path, byte_range: Optional[Tuple[int, Optional[int]]] = None) ->
4949
def _put(
5050
path: Path,
5151
value: BytesLike,
52-
start: Optional[int] = None,
52+
start: int | None = None,
5353
auto_mkdir: bool = True,
5454
) -> int | None:
5555
if auto_mkdir:
@@ -71,7 +71,7 @@ class LocalStore(Store):
7171
root: Path
7272
auto_mkdir: bool
7373

74-
def __init__(self, root: Union[Path, str], auto_mkdir: bool = True):
74+
def __init__(self, root: Path | str, auto_mkdir: bool = True):
7575
if isinstance(root, str):
7676
root = Path(root)
7777
assert isinstance(root, Path)
@@ -88,9 +88,7 @@ def __repr__(self) -> str:
8888
def __eq__(self, other: object) -> bool:
8989
return isinstance(other, type(self)) and self.root == other.root
9090

91-
async def get(
92-
self, key: str, byte_range: Optional[Tuple[int, Optional[int]]] = None
93-
) -> Optional[bytes]:
91+
async def get(self, key: str, byte_range: tuple[int, int | None] | None = None) -> bytes | None:
9492
assert isinstance(key, str)
9593
path = self.root / key
9694

@@ -100,8 +98,8 @@ async def get(
10098
return None
10199

102100
async def get_partial_values(
103-
self, key_ranges: List[Tuple[str, Tuple[int, int]]]
104-
) -> List[Optional[bytes]]:
101+
self, key_ranges: list[tuple[str, tuple[int, int]]]
102+
) -> list[bytes | None]:
105103
"""
106104
Read byte ranges from multiple keys.
107105
Parameters
@@ -124,7 +122,7 @@ async def set(self, key: str, value: BytesLike) -> None:
124122
path = self.root / key
125123
await to_thread(_put, path, value, auto_mkdir=self.auto_mkdir)
126124

127-
async def set_partial_values(self, key_start_values: List[Tuple[str, int, bytes]]) -> None:
125+
async def set_partial_values(self, key_start_values: list[tuple[str, int, bytes]]) -> None:
128126
args = []
129127
for key, start, value in key_start_values:
130128
assert isinstance(key, str)
@@ -169,6 +167,9 @@ async def list_prefix(self, prefix: str) -> AsyncGenerator[str, None]:
169167
-------
170168
AsyncGenerator[str, None]
171169
"""
170+
for p in (self.root / prefix).rglob("*"):
171+
if p.is_file():
172+
yield str(p)
172173

173174
to_strip = str(self.root) + "/"
174175
for p in (self.root / prefix).rglob("*"):

src/zarr/store/memory.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,4 +88,4 @@ async def list_dir(self, prefix: str) -> AsyncGenerator[str, None]:
8888
else:
8989
for key in self._store_dict:
9090
if key.startswith(prefix + "/") and key != prefix:
91-
yield key.strip(prefix + "/").split("/")[0]
91+
yield key.removeprefix(prefix + "/").split("/")[0]

tests/v2/conftest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
import pathlib
21
import pytest
2+
import pathlib
33

44

55
@pytest.fixture(params=[str, pathlib.Path])

tests/v3/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)