Description
Hi and thanks for the great package!
While working with juliacall
+ PythonCall.jl
, I ran into two issues related to pandas.Categorical handling.
(In all examples below, jl
refers to Main from juliacall
, i.e., from juliacall import Main as jl
.)
DataFrame conversion ignores Categorical columns
When passing a pandas.DataFrame
with categorical columns (i.e., dtype='category'
), those columns are silently converted to Int64
vectors in Julia (presumably the .codes). This results in CategoricalArray semantics being lost — so interactions in Julia formulas like id & η1
are treated as numeric rather than generating dummy variables.
jl.convert()
can’t convert pandas.Categorical to any Julia type
I tried using jl.convert(CategoricalArray, col)
directly on a pandas.Series
with categorical dtype, but got a MethodError
. It appears PythonCall
doesn’t yet support converting pandas.Categorical
to any Julia-native type.
To work around this, I convert the column to str
in Python (so it arrives as a Vector{String}
), then manually wrap it in categorical(...)
on the Julia side. This works, but it's not ideal for type fidelity or automatic translation.
Let me know if there's a cleaner workaround — or if you'd be open to a PR to improve automatic CategoricalArray
support.
Thanks again!