Support text types in sklearn predictors

#### Description
Khiops 11 supports `Text` columns which have a specialized AutoML treatment as oppossed to normal strings(`Categorical`). Sklearn predictors should also support this type.

#### Questions/Ideas
- Does pandas and/or numpy have a specialized `Text` type?
- <s>Option 1: Implement it as a `Dataset` property
  - Add to the table specification tuple should have an optional field `text_columns` with the names of the text fields **or**
  - Add another field to the spec `table_text_columns` indexed by the table name and whose values are the names of the text columns (I prefer this one)
  - When creating the dictionary the `Dataset` object will have all the necessary info to add the specified columns as `Text` </s> The Dataset API is not exposed.
- <s>Option 2: Implement it as a `fit` parameter
  - As above add `table_text_columns` but as a `fit` optional parameter
    - I works but the fact that a column is a `Text` is part of the description of the dataset
    - This parameter should be fed to the dictionary creation routine</s> The Dataset API is not exposed.
- [later edit: 2025/08/01] Option 3: Add 2 extra parameters `n_text_features` and `text_feature_type` to:
  - Option 3.1: the `KhiopsPredictor` estimator initializer (`__init__` method)
  - Option 3.2: the `KhiopsPredictor`'s `fit` method
  - *Note*: The `text_columns` needs to be passed as well:
    - either to the `KhiopsPredictor` initializer
    - or to the estimator's `fit` method.
  - *Note 2*:
     - The Pandas `StringDType` should be used for columns whose Khiops type is `Text`; see https://pandas.pydata.org/docs/user_guide/text.html#working-with-text-data.
     - Lists of Pandas `StringDType`s should be used for `TextList` (**to clarify**).
- <s>Expose Dataset API it will have two init patters:
  - Big Constructor 
  - Builder pattern
- The big constructor uses the builder 
- And the `dict` interface should be maintained</s> The Dataset API is not exposed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support text types in sklearn predictors #39

Description

Questions/Ideas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support text types in sklearn predictors #39

Description

Description

Questions/Ideas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions