Support for parquet encoder and decoder

**Describe the feature you'd like**
Support for the MIME type parquet files in the sagemaker toolkit. E.g. in the README of this repo, there is an example `default_input_fn()`:
```
   def default_input_fn(self, input_data, content_type, context=None):
        """A default input_fn that can handle JSON, CSV and NPZ formats.

        Args:
            input_data: the request payload serialized in the content_type format
            content_type: the request content_type
            context (obj): the request context (default: None).

        Returns: input_data deserialized into torch.FloatTensor or torch.cuda.FloatTensor depending if cuda is available.
        """
        return decoder.decode(input_data, content_type)

```

Looking into `decoder.decode`, I see the following MIME types are supported:
```python
_decoder_map = {
    content_types.NPY: _npy_to_numpy,
    content_types.CSV: _csv_to_numpy,
    content_types.JSON: _json_to_numpy,
    content_types.NPZ: _npz_to_sparse,
}
```
Should not be too hard to add `parquet` here. Parquet is a dat file commonly used with large datasets and also supported in other sagemaker services, for example in [Autopilot](https://aws.amazon.com/about-aws/whats-new/2022/01/amazon-sagemaker-autopilot-apache-parquet/).

**How would this feature be used? Please describe.**
Reduce storage costs, data I/O costs, increase speed while processing.

**Describe alternatives you've considered**

CSV is the standard, but it's a much less efficient way to store, read and write column-oriented data.

**Additional context**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for parquet encoder and decoder #127

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for parquet encoder and decoder #127

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions