Skip to content

BUG & FEAT Fix convergence issues with Pinball loss on large datasets (Issue #276) #306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

floriankozikowski
Copy link
Contributor

@floriankozikowski floriankozikowski commented May 6, 2025

Context of the PR

As reported in Issue #276, the PDCD_WS solver exhibits convergence problems with the Pinball loss on larger datasets, appearing to reach a saddle point state. The issue can be reproduced using:

import numpy as np
from skglm import GeneralizedLinearEstimator
from skglm.experimental.pdcd_ws import PDCD_WS
from skglm.experimental.quantile_regression import Pinball
from skglm.penalties import L1
from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler

# Generate data with 1000 samples (issue happens at n > 1000)
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
X = StandardScaler().fit_transform(X)

# Set up problem with Pinball loss and PDCD_WS solver
datafit = Pinball(0.5)  # Median regression
penalty = L1(alpha=0.1)
solver = PDCD_WS(max_iter=500, max_epochs=500, tol=1e-2, verbose=True)

# This fit will not converge well
estimator = GeneralizedLinearEstimator(datafit=datafit, penalty=penalty, solver=solver)
estimator.fit(X, y)

Contributions of the PR

  1. Created a new QuantileHuber class - a smoothed version of the Pinball loss that replaces the non-differentiable point with a quadratic region, similar to how Huber loss smooths the absolute loss.
  2. Implemented SmoothQuantileRegressor - a progressive smoothing solver that:
  • Gradually decreases the smoothing parameter (delta)
  • Solves each smoothed problem with FISTA
  • Tracks the best solution (lowest quantile error) across all smoothing levels

Users can now solve quantile regression problems on larger datasets with:

from skglm.experimental.smooth_quantile_regressor import SmoothQuantileRegressor
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_regression

# Generate data
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1)
X = StandardScaler().fit_transform(X)

# Create and fit regressor
sqr = SmoothQuantileRegressor(
    quantile=0.5,  # for median regression (0.5) or any other quantile
    alpha=0.1,     # L1 regularization strength
    verbose=False
)
sqr.fit(X, y)

# Make predictions
y_pred = sqr.predict(X)

Limitations and Need for Further Refinement:

Currently, this is an approximation approach and NOT an exact solution.
Speed has not been tested, and the unit test for now only successfully tests against Issue #276.
Initial idea was to use a dual solver approach, where PDCD_WS its the Final Solver. This did not work.

Potential solutions could be path following methods, or review the code again (from earlier commits) and look for any mistakes (e.g. warmstart, intercept mistakes, etc.)

Checks before merging PR

  • added documentation for any new feature
  • added unit tests
  • edited the what's new (if applicable)

@@ -0,0 +1,156 @@
"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests so far:

  • τ ≠ 0.5 (e.g. 0.8): SmoothQuantileRegressor reduces loss by >50% vs QuantileRegressor.
  • Large n (≥10 000):SmoothQuantileRegressor is 1.3×–2× faster and more accurate.
  • Median τ=0.5 & n≈1 000: scikit-learn’s QuantileRegressor remains the best choice.

These are anecdotal results—your mileage may vary. Tune sequence and inner‐solver settings accordingly. Still room for improvement

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@floriankozikowski to avoid having too many files, this should be merged with smooth_quantile.py since they contain very related code

from skglm.experimental.solver_strategies import StageBasedSolverStrategy


@jit(nopython=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a priori there is no for loop, everything is vectorized so no need to JIT compile it



@jit(nopython=True, cache=True)
def max_subgrad_gap(residuals, delta, quantile):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this compared to existing skglm code ?

self.quantile = float(quantile)
if not 0 < self.quantile < 1:
raise ValueError("quantile must be between 0 and 1")
self.alpha = float(alpha)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at other parts of the code,we do not cast stuff as float (a priori it's not needed, python will do it itself if need be)

self.solver_strategy = StageBasedSolverStrategy(self.solver_params)

from skglm.experimental.quantile_huber import QuantileHuber
self._quantile_huber_cls = QuantileHuber
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of this if it has no attribute ?

if not hasattr(self.smooth_solver, 'warm_start'):
self.smooth_solver.warm_start = False

def _initialize_pdcd(self, X, y):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a v0, use only smooth problems, so use AndersonCDWS solver instead. It should not have any params


return w, dual

def _get_solver_for_stage(self, delta, stage, n_features):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems too complex, not needed at this stage. Just solve a sequence of smoothed problems with decreasing smoothing parameters.


# Center data to handle intercept manually
y_mean = np.mean(y)
if is_sparse:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to be avoided at all cost ; all our solvers support sparse X ! also, no need to center

else:
deltas = [self.initial_delta]

# Build L1‐continuation schedule
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a sequence of alpha being used ?

@floriankozikowski
Copy link
Contributor Author

too large, new shorter version in PR #312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants