Skip to content

Failed to use TunedModel with precomputed-SVM #1141

@KeishiS

Description

@KeishiS

First of all, thank you for the great work you're doing in maintaining this project. I encoutered what seems to be a bug when attempting to use a support vector classifier with a precomputed Gram matrix, while performing hyperparameter tuning using TunedModel. I would like to submit a pull request to address the issue, but I'm unsure which part of the codebase needs modification. Any advice would be greatly appreciated.

Describe the bug
When performing parameter search with TunedModel on an SVM with a precomputed kernel, the data splitting is not carried out properly.

To Reproduce

#%%
using MLJ, MLJBase
using MLJScikitLearnInterface
using LinearAlgebra
SVMClassifier = @load SVMClassifier pkg = MLJScikitLearnInterface

#%% Create toy data
using Random, Distributions
θ₀ = rand(Uniform(0, 2π), 100)
X₀ = 0.5 .* [cos.(θ₀) sin.(θ₀)] .+ (randn(100, 2) .* 0.12)
y₀ = zeros(Int, 100)

θ₁ = rand(Uniform(0, 2π), 100)
X₁ = [cos.(θ₁) sin.(θ₁)] .+ (randn(100, 2) .* 0.12)
y₁ = ones(Int, 100)

n = 200
X = vcat(X₀, X₁)
y = MLJBase.categorical(vcat(y₀, y₁))
gmat = [
    exp(-norm(X[i, :] - X[j, :]) * 0.1)
    for i in 1:n, j in 1:n
]

#%%
model = SVMClassifier(kernel="precomputed")
tuning_model = TunedModel(
    model=model,
    range=range(model, :C; lower=0.01, upper=1000, scale=:log),
    measure=accuracy
)
mach = machine(tuning_model, gmat, y)
fit!(mach)

Expected behavior

During the process of searching for the best params, the Gram matrix gmat is divided into training data and test data. We expect gmat[train_idx, train_idx] and gmat[test_idx, train_idx] to be created. However, the current code splits it into gmat[train_idx, :] and gmat[test_idx, :]. This operation is executed in the fit_and_extract_on_fold function in MLJBase.jl/src/resampling.jl.

Versions

  • julia 1.10.5
  • MLJ v0.20.0
  • MLJBase v1.7.0
  • MLJScikitLearnInterface v0.7.0

I would be grateful for any advice on how to approach solving this issue. Thank you for taking the time to read and consider this matter!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions