Skip to content

pandas.Categorical not preserved during DataFrame conversion and jl.convert fails #630

Open
@mArc0v0mag1c

Description

@mArc0v0mag1c

Hi and thanks for the great package!

While working with juliacall + PythonCall.jl, I ran into two issues related to pandas.Categorical handling.

(In all examples below, jlrefers to Main from juliacall, i.e., from juliacall import Main as jl.)

DataFrame conversion ignores Categorical columns
When passing a pandas.DataFrame with categorical columns (i.e., dtype='category'), those columns are silently converted to Int64 vectors in Julia (presumably the .codes). This results in CategoricalArray semantics being lost — so interactions in Julia formulas like id & η1 are treated as numeric rather than generating dummy variables.

jl.convert() can’t convert pandas.Categorical to any Julia type
I tried using jl.convert(CategoricalArray, col) directly on a pandas.Series with categorical dtype, but got a MethodError. It appears PythonCall doesn’t yet support converting pandas.Categorical to any Julia-native type.

To work around this, I convert the column to str in Python (so it arrives as a Vector{String}), then manually wrap it in categorical(...) on the Julia side. This works, but it's not ideal for type fidelity or automatic translation.

Let me know if there's a cleaner workaround — or if you'd be open to a PR to improve automatic CategoricalArray support.

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions