Skip to content

Conversation

stinodego
Copy link
Contributor

@stinodego stinodego commented Aug 29, 2023

I believe there is an issue with the way data buffer dtypes are represented across various implementations of the protocol (ours, pyarrow, pandas, ...).

The issue is that the data buffer is assigned the column dtype. This is only correct for integers and floats. Categoricals, strings, and datetime types have a some integer as their physical representation. The data buffer should have this physical data type associated with it.

The fix in this PR is simple enough, but this cannot be merged until other libraries fix their from_dataframe implementation. I opened some issues:

When those libraries update their from_dataframe implementations per the issues above, the roundtrip tests should pass and this can be merged.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Aug 29, 2023
@stinodego stinodego added the A-interchange Area: Python dataframe interchange protocol label Sep 7, 2023
@stinodego stinodego force-pushed the fix-protocol-data-buffer branch from b669abc to 6974371 Compare September 7, 2023 08:17
@stinodego stinodego added the blocked Cannot be worked on due to external dependencies, or significant new internal features needed first label Sep 7, 2023
@stinodego stinodego added this to the 1.0.0 milestone Nov 16, 2023
@stinodego stinodego force-pushed the fix-protocol-data-buffer branch from 6974371 to 177fdc2 Compare December 5, 2023 11:36
@stinodego stinodego removed the blocked Cannot be worked on due to external dependencies, or significant new internal features needed first label Dec 8, 2023
@stinodego
Copy link
Contributor Author

PyArrow and pandas have updated their implementations. This can now be merged. But let's give it a bit of time for integration's sake. I will merge this when we start merging breaking changes for 1.0.0.

@stinodego stinodego marked this pull request as ready for review December 8, 2023 14:04
@stinodego stinodego added the do not merge This pull requests should not be merged right now label Dec 8, 2023
@stinodego stinodego removed the do not merge This pull requests should not be merged right now label Jan 9, 2024
@stinodego stinodego force-pushed the fix-protocol-data-buffer branch from 59df4db to 6ce2293 Compare January 9, 2024 10:44
@stinodego stinodego requested a review from c-peters as a code owner January 9, 2024 10:44
@stinodego
Copy link
Contributor Author

I'm going ahead with this one as well. I think it makes sense to make this change together with the new from_dataframe implementation. That way, any integration issues can be addressed within the same Polars update.

@stinodego stinodego merged commit 26da007 into main Jan 9, 2024
@stinodego stinodego deleted the fix-protocol-data-buffer branch January 9, 2024 11:06
@c-peters c-peters added the accepted Ready for implementation label Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-interchange Area: Python dataframe interchange protocol accepted Ready for implementation fix Bug fix python Related to Python Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants