AttributeError when using dask_ml.model_selection.kfold object 

**What happened**:
I'm currently trying to create a pipeline for model training using `LogisticRegression` and Nested cross-validation. I've got an unexpected `AttributeError` during the pipeline execution. 
```shell
Exception: AttributeError("'numpy.ndarray' object has no attribute 'chunks'")
```
**What you expected to happen**:
I wasn't expecting that since, I double-checked that all the objects are `dask.array`. The following MWE shows what my pipeline looks like. 

**Minimal Complete Verifiable Example**:

```python
from typing import Tuple, Any
from dask.array.core import Array
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from dask_ml.model_selection import GridSearchCV, KFold
from dask_ml.linear_model import LogisticRegression
from dask_ml.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.base import is_classifier
from dask.distributed import Client, progress
import dask.array as da

import numpy as np
import joblib
from dask_ml.datasets import make_classification as dask_make_classification

# import warnings filter
from warnings import simplefilter

# ignore all future warnings
simplefilter(action="ignore", category=FutureWarning)


def fake_dataset() -> Tuple[Array, Array]:
    X, y = dask_make_classification(
        n_samples=1000,
        n_features=20,
        random_state=1,
        n_informative=10,
        n_redundant=10,
        chunks=1000 // 20,
    )
    return X, y


def train_model(X: Array, y: Array) -> None:
    n_outer_splits = 2
    n_inner_splits = 2
    param_grid = [
        {
            "classifier": [LogisticRegression()],
            "classifier__penalty": ["l1", "l2"],
            "classifier__C": np.logspace(-4, 4, 20),
            "classifier__solver": ["liblinear"],
        },
    ]
    # define the model
    pipeline = Pipeline([("classifier", LogisticRegression())])
    # XXX: check that is a proper model
    try:
        if not is_classifier(pipeline["classifier"]):
            raise Exception("Not valid classification algorithm")
    except Exception as e:
        print(f"Be aware of: {e}")
    finally:
        pass
    # set-up the nested cross-validation procedure
    cv_outer = KFold(n_splits=n_outer_splits, shuffle=True, random_state=1)
    # enumerate splits
    outer_results = list()
    for kth_fold, (train_ix, test_ix) in enumerate(cv_outer.split(X)):
        print(f"Running {kth_fold} Fold")
        # split data
        X_train, X_test = X[train_ix, :], X[test_ix, :]
        y_train, y_test = y[train_ix], y[test_ix]
        # setup inner cross-validation procedure
        cv_inner = KFold(n_splits=n_inner_splits, shuffle=True, random_state=1)

        # define search
        search = GridSearchCV(
            estimator=pipeline,
            param_grid=param_grid,
            scoring="accuracy",
            cv=cv_inner,
            refit=True,
        )
        with joblib.parallel_backend("dask"):
            result = search.fit(X_train, y_train)

    return None


if __name__ == "__main__":
    client = Client(
        processes=False, threads_per_worker=1, n_workers=4, memory_limit="10GB"
    )

    X, y = fake_dataset()
    train_model(X, y)
```

**Anything else we need to know?**:
Further debugging showed that the error comes from fit operation, however, there are not atrbiutes using `np.ndarray` objects. 

**Environment**:

- Dask version: 1.9.0
- Python version: 3.8
- Operating System: Ubuntu 20.04 LTS
- Install method (conda, pip, source): conda


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AttributeError when using dask_ml.model_selection.kfold object #849

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

AttributeError when using dask_ml.model_selection.kfold object #849

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions