Closed
Description
I run a pipeline to extract text features as follows.
pipeline = Pipeline([
('text', DataFrameMapper([
('description', CountVectorizer())
]))
])
This is working fine and is nicer than the approach described in [1]:
pipeline = Pipeline([
('text', Pipeline([
('selector', ItemSelector(key='description')),
('bow', CountVectorizer()),
]))
])
However, the former results in a dense encoding (which is intractable for text). Are you planning to change that?
[1] http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html
Metadata
Metadata
Assignees
Labels
No labels