Sparse vs. Dense Encoding

I run a pipeline to extract text features as follows.

``` python
pipeline = Pipeline([
    ('text', DataFrameMapper([
        ('description', CountVectorizer())
    ]))
])
```

This is working fine and is nicer than the approach described in [1]:

``` python
pipeline = Pipeline([
    ('text', Pipeline([
        ('selector', ItemSelector(key='description')),
        ('bow', CountVectorizer()),
    ]))
])
```

However, the former results in a dense encoding (which is intractable for text). Are you planning to change that?

[1] http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sparse vs. Dense Encoding #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sparse vs. Dense Encoding #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions