Open
Description
Per #37, scipy.sparse.hstack
is called whenver a sparse matrix is in extracted. However, scipy.sparse.hstack
cannot upcast dtype=object, so even if sparse=False for the mapper object, the hstack will fail whenver a np.ndarray of dtype=object is involved.
Passing example, note upcasts int64/float64 to float64.
In [432]:
df = pd.DataFrame({'int':[1,2,3],
'flt':[2.,3,4],
'obj':['r','w','b']})
mapper = sklearn_pandas.DataFrameMapper([
(['int'],[sklearn.preprocessing.OneHotEncoder()]),
(['flt'],[sklearn.preprocessing.OneHotEncoder()])
], sparse=True)
mapper.fit_transform(df)
Out[432]:
<3x6 sparse matrix of type '<type 'numpy.float64'>'
with 6 stored elements in Compressed Sparse Row format>
Failing example, unable to upcast int64/object see scipy\sparse\sputils.pyc for upcast code.
In [434]:
mapper = sklearn_pandas.DataFrameMapper([
(['int'],[sklearn.preprocessing.OneHotEncoder()]),
('obj', None)])
TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))
I think it's ok if an error is thrown when sparse=True and an array of type object is involved, but not if sparse=False.
I'll submit a pull request with a recommended fix.
Metadata
Metadata
Assignees
Labels
No labels