Question: Guaranteed zero-copy round-trip from numpy?

This is for informing a scikit-learn design decision, I had briefly talked with @jorisvandenbossche about this a bit ago.

The question is whether we can rely on having zero-copy wrapping and unwrapping of numpy arrays into pandas dataframes, i.e. is it future proof to assume something like

```python
X = np.array(...)
X_df = pd.DataFrame(X)
X_again = np.asarray(X_df)
```
doesn't result in a copy of the data and ``X_again`` shares the memory of ``X``?

Context: We want to attach some meta-data to our numpy arrays, in particular I'm interested in column names. Pandas is an obvious candidate for doing that, but core sklearn works on numpy arrays.
So if we want to use pandas, we need to make sure that there's no overhead in wrapping and unwrapping.
And this is a design decision that's very hard to undo, so I want to make sure that it's reasonably future-proof.

@jorisvandenbossche had mentioned that there were thoughts about making pandas a column store, which sounds like it would break the zero copy requirement.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question: Guaranteed zero-copy round-trip from numpy? #27211

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question: Guaranteed zero-copy round-trip from numpy? #27211

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions