Skip to content

Support for multidimensional dtypes #3443

Closed
@alexbw

Description

@alexbw

With 0.11 out, Pandas supports more dtypes than before, which is very useful to us science folks. However, some data is intrinsically multi-dimensional, high enough dimensional so that using labels on columns is impractical (for instance, images).

I understand DataFrames or Panels are usually the recommended panacea for this problem. This works if the datatype doesn't have any annotation. For instance, for each frame of a video, I have electrophysiology traces, timestamps, and environmental variables measured.

I have a working solution where I explicitly separate out the non-scalar data from the scalar data. I use Pandas exclusively for the scalar data, and then a dictionary of multi-D arrays for the array data.

What is the work and overhead involved in supporting multi-D data types? I would love to keep my entire ecosystem in Pandas, as it's much faster and richer than just NumPy data wrangling.

See below for the code that I hope is possible to run, with fixes.

If you can point me to a place in the codebase where I can tinker, that would also be much appreciated.

import pandas as pd
mydtype=np.dtype('(3,3)f4')
pd.Series(np.zeros(3,), dtype=mydtype)
Exception: Data must be 1-dimensional

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions