-
Notifications
You must be signed in to change notification settings - Fork 37
BUG & FEAT Fix convergence issues with Pinball loss on large datasets (Issue #276) #306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG & FEAT Fix convergence issues with Pinball loss on large datasets (Issue #276) #306
Conversation
…te unnecessary draft files
@@ -0,0 +1,156 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests so far:
- τ ≠ 0.5 (e.g. 0.8): SmoothQuantileRegressor reduces loss by >50% vs QuantileRegressor.
- Large n (≥10 000):SmoothQuantileRegressor is 1.3×–2× faster and more accurate.
- Median τ=0.5 & n≈1 000: scikit-learn’s QuantileRegressor remains the best choice.
These are anecdotal results—your mileage may vary. Tune sequence and inner‐solver settings accordingly. Still room for improvement
…examples (still not ideal speed, precision and sparsity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@floriankozikowski to avoid having too many files, this should be merged with smooth_quantile.py since they contain very related code
from skglm.experimental.solver_strategies import StageBasedSolverStrategy | ||
|
||
|
||
@jit(nopython=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a priori there is no for loop, everything is vectorized so no need to JIT compile it
|
||
|
||
@jit(nopython=True, cache=True) | ||
def max_subgrad_gap(residuals, delta, quantile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this compared to existing skglm code ?
self.quantile = float(quantile) | ||
if not 0 < self.quantile < 1: | ||
raise ValueError("quantile must be between 0 and 1") | ||
self.alpha = float(alpha) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at other parts of the code,we do not cast stuff as float (a priori it's not needed, python will do it itself if need be)
self.solver_strategy = StageBasedSolverStrategy(self.solver_params) | ||
|
||
from skglm.experimental.quantile_huber import QuantileHuber | ||
self._quantile_huber_cls = QuantileHuber |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the point of this if it has no attribute ?
if not hasattr(self.smooth_solver, 'warm_start'): | ||
self.smooth_solver.warm_start = False | ||
|
||
def _initialize_pdcd(self, X, y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a v0, use only smooth problems, so use AndersonCDWS solver instead. It should not have any params
|
||
return w, dual | ||
|
||
def _get_solver_for_stage(self, delta, stage, n_features): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems too complex, not needed at this stage. Just solve a sequence of smoothed problems with decreasing smoothing parameters.
|
||
# Center data to handle intercept manually | ||
y_mean = np.mean(y) | ||
if is_sparse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to be avoided at all cost ; all our solvers support sparse X ! also, no need to center
else: | ||
deltas = [self.initial_delta] | ||
|
||
# Build L1‐continuation schedule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a sequence of alpha being used ?
too large, new shorter version in PR #312 |
Context of the PR
As reported in Issue #276, the PDCD_WS solver exhibits convergence problems with the Pinball loss on larger datasets, appearing to reach a saddle point state. The issue can be reproduced using:
Contributions of the PR
Users can now solve quantile regression problems on larger datasets with:
Limitations and Need for Further Refinement:
Currently, this is an approximation approach and NOT an exact solution.
Speed has not been tested, and the unit test for now only successfully tests against Issue #276.
Initial idea was to use a dual solver approach, where PDCD_WS its the Final Solver. This did not work.
Potential solutions could be path following methods, or review the code again (from earlier commits) and look for any mistakes (e.g. warmstart, intercept mistakes, etc.)
Checks before merging PR