Help with Parquet storage

Hi there Martin,

I'm trying to learn how to use the parquet storage in the hope of creating a section in the Pythia cookbook to demonstrate this. I'm going off of [this test](https://github.com/fsspec/kerchunk/blob/main/kerchunk/tests/test_combine.py#L307) and the [code block in the Kerchunk docs](https://fsspec.github.io/kerchunk/advanced.html?highlight=dfreferencefilesystem#parquet-storage). 

My initial attempt is below, but when I try to read the parquet I get a `FileNotFoundError`:

```
FileNotFoundError: [Errno 2] No such file or directory: '[/Users/nrhagen/Documents/carbonplan/pythia/kerchunk-cookbook/notebooks/foundations/combined.parq/TSLB/refs.0.parq](https://file+.vscode-resource.vscode-cdn.net/Users/nrhagen/Documents/carbonplan/pythia/kerchunk-cookbook/notebooks/foundations/combined.parq/TSLB/refs.0.parq)'
```

The only file in the `combined.parq` directory is `['.zmetadata']`. So it seems like I'm not writing the combined reference correctly. 

Also, I'm not clear on this line from the test `LazyReferenceMapper.create(10, temp_dir, fs)`. Is the first arg of `LazyReferenceMapper.create` the length of input files or?

Thanks again for the help! It would be great to figure out how to use the `parquet` functionality. 

``` python

from tempfile import TemporaryDirectory
import xarray as xr
from kerchunk.combine import MultiZarrToZarr
from kerchunk.hdf import SingleHdf5ToZarr
import os 
from kerchunk import hdf, combine, df
from fsspec.implementations.reference import LazyReferenceMapper, ReferenceFileSystem
import fsspec 

file_pattern = [
  's3://wrf-se-ak-ar5/ccsm/rcp85/daily/2060/WRFDS_2060-01-01.nc',
  's3://wrf-se-ak-ar5/ccsm/rcp85/daily/2060/WRFDS_2060-01-02.nc'
]

single_ref_sets = [hdf.SingleHdf5ToZarr(_).translate() for _ in file_pattern]

fs = fsspec.filesystem("file")
td = TemporaryDirectory()
temp_dir = td.name
temp_dir = str(temp_dir)
out = LazyReferenceMapper.create(10, temp_dir, fs)

mzz = MultiZarrToZarr(
    single_ref_sets,
    remote_protocol="memory",
    concat_dims=["Time"],identical_dims=['south_north', 'west_east', 'interp_levels', 'soil_layers_stag'],
   out=out,
).translate()

if not os.path.exists("combined.parq"):
    os.makedirs("combined.parq")
df.refs_to_dataframe(mzz, "combined.parq")

fs = ReferenceFileSystem(
    "combined.parq", lazy=True)
ds = xr.open_dataset(
    fs.get_mapper(), engine="zarr",
    backend_kwargs={"consolidated": False}
)


```


cc @rsignell-usgs if you've already figured this out!




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help with Parquet storage #345

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Help with Parquet storage #345

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions