-
Notifications
You must be signed in to change notification settings - Fork 157
Description
First of all, thank you for the great work you're doing in maintaining this project. I encoutered what seems to be a bug when attempting to use a support vector classifier with a precomputed Gram matrix, while performing hyperparameter tuning using TunedModel
. I would like to submit a pull request to address the issue, but I'm unsure which part of the codebase needs modification. Any advice would be greatly appreciated.
Describe the bug
When performing parameter search with TunedModel on an SVM with a precomputed kernel, the data splitting is not carried out properly.
To Reproduce
#%%
using MLJ, MLJBase
using MLJScikitLearnInterface
using LinearAlgebra
SVMClassifier = @load SVMClassifier pkg = MLJScikitLearnInterface
#%% Create toy data
using Random, Distributions
θ₀ = rand(Uniform(0, 2π), 100)
X₀ = 0.5 .* [cos.(θ₀) sin.(θ₀)] .+ (randn(100, 2) .* 0.12)
y₀ = zeros(Int, 100)
θ₁ = rand(Uniform(0, 2π), 100)
X₁ = [cos.(θ₁) sin.(θ₁)] .+ (randn(100, 2) .* 0.12)
y₁ = ones(Int, 100)
n = 200
X = vcat(X₀, X₁)
y = MLJBase.categorical(vcat(y₀, y₁))
gmat = [
exp(-norm(X[i, :] - X[j, :]) * 0.1)
for i in 1:n, j in 1:n
]
#%%
model = SVMClassifier(kernel="precomputed")
tuning_model = TunedModel(
model=model,
range=range(model, :C; lower=0.01, upper=1000, scale=:log),
measure=accuracy
)
mach = machine(tuning_model, gmat, y)
fit!(mach)
Expected behavior
During the process of searching for the best params, the Gram matrix gmat
is divided into training data and test data. We expect gmat[train_idx, train_idx]
and gmat[test_idx, train_idx]
to be created. However, the current code splits it into gmat[train_idx, :]
and gmat[test_idx, :]
. This operation is executed in the fit_and_extract_on_fold
function in MLJBase.jl/src/resampling.jl
.
Versions
- julia 1.10.5
- MLJ v0.20.0
- MLJBase v1.7.0
- MLJScikitLearnInterface v0.7.0
I would be grateful for any advice on how to approach solving this issue. Thank you for taking the time to read and consider this matter!