Skip to content

gcsfs doesn't properly handle gzipped files, ignoring content-encoding #461

@jimmywan

Description

@jimmywan

I have a file "foo.txt.gz" that has been uploaded with the following metadata:

Content-Type: text/plain
Content-Encoding: gzip

I'm trying to copy its contents to a new file in cloud storage that is uncompressed to workaround a bug where my tooling (gcloud) can't properly handle gzip input.

If I try to pass the compression flag on read, it complains about the file not being a gzip file, implying that transcoding is occurring:

with fs.open('gcs://jw-sandbox/uploads.txt.gz', 'rb', compression='gzip') as read_file:
...     with fs.open('gcs://jw-sandbox/uploads.txt', 'wb') as write_file:
...             shutil.copyfileobj(read_file, write_file)

gzip.BadGzipFile: Not a gzipped file (b'gs')

If I try to read the file without the compression flag and just dump contents to stdout, I only get the first N bytes of the decompressed contents where N is the compressed size:

with fs.open('gcs://jw-sandbox/uploads.txt.gz', 'rb', compression='gzip') as read_file:
...     for f in read_file:
...             print(f)
>>> print(gcsfs.__version__)
2022.02.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions