-
Notifications
You must be signed in to change notification settings - Fork 125
Open
Labels
BacklogThis is in the MAPIE team development backlog, yet to be prioritised.This is in the MAPIE team development backlog, yet to be prioritised.RegressionRelated to regression (excluding time series)Related to regression (excluding time series)
Description
Describe the bug
When using the new MondrianCP class I'm unable to fit my estimator with a Pandas dataframe, while using the standard MapieRegressor this works fine. Since I'm using a sklearn pipeline that contains some column transformers that use the pandas column name, I can't transform my data into a numpy array first because then sklearn gives me an error when fitting the estimator.
To Reproduce
Below the code to reproduce my problem.
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor, ColumnTransformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder
from mapie.regression import MapieRegressor
from mapie.mondrian import MondrianCP
from lightgbm import LGBMRegressor
import pandas as pd
from sklearn.model_selection import train_test_split
# Create some dummy data
data = pd.DataFrame(np.random.rand(100, 5), columns=['feature1', 'feature2', 'feature3', 'feature4', 'feature5'])
data['categorical_feature'] = np.random.choice(['A', 'B', 'C'], size=100)
y = pd.Series(np.random.rand(100))
# Create bins for the partition
data['BIN'] = pd.cut(y, bins=3, labels=[1, 2, 3])
# Split the data into a train and calibration set
data_train, data_calib, y_train, y_calib = train_test_split(data, y, test_size=0.2, random_state=42)
model = LGBMRegressor(
n_estimators=100,
learning_rate=0.1,
max_depth=5,
min_child_samples=10,
num_leaves=31,
random_state=42
)
ct = ColumnTransformer([
("site", OneHotEncoder(), ['categorical_feature']),
("features", RobustScaler(), ['feature1', 'feature2', 'feature3', 'feature4', 'feature5']),
])
estimators = [('transformers',ct), ('model', model)]
pre_pipe = Pipeline(estimators)
pipe = TransformedTargetRegressor(regressor=pre_pipe, transformer=RobustScaler())
pipe.fit(data_train, y_train)
strategy = "mondrian"
if strategy == "mondrian":
mapie_regressor = MondrianCP(MapieRegressor(pipe, cv='prefit'))
mapie_regressor.fit(data_calib, y_calib, partition=data_calib['BIN'])
if strategy == "mondrian_numpy":
mapie_regressor = MondrianCP(MapieRegressor(pipe, cv='prefit'))
mapie_regressor.fit(data_calib.to_numpy(), y_calib, partition=data_calib['BIN'])
else:
mapie_regressor = MapieRegressor(estimator=pipe, cv='prefit')
mapie_regressor = mapie_regressor.fit(data_calib, y_calib)
By changing the strategy to mondrian_numpy you can also reproduce the sklearn error I receive.
Expected behavior
Be able to use a Pandas dataframe as input data for MondrianCP class.
Metadata
Metadata
Assignees
Labels
BacklogThis is in the MAPIE team development backlog, yet to be prioritised.This is in the MAPIE team development backlog, yet to be prioritised.RegressionRelated to regression (excluding time series)Related to regression (excluding time series)