Penalty on shap calculation for higher variance#218
Conversation
|
Btw, a lot more of the precommit hooks have been added recently to improve code quality. I see this affects code you touched indirectly. I’d like to know your thoughts on this if it is bothersome or not. |
|
I ran a simulation comparing when the variance penalty is applied vs when it is not. My goal is to stress test f this changed should be merged or not. Code to run simulation saved in this gist Some observations.
Running for EarlyStoppingShapRFECV I get the below results: Observations:
Observations:
|
@ReinierKoops - I didn't notice any pre-commit hooks being triggered at all honestly. Personally, I like checks that force code consistency as helps me the submitter align with the repo owners code style preferences before the PR review happens. |
|
Great work. Please have a look at the output of the GitHub actions. In the mean time I’ll try to review it today or tomorrow. I’d suggest you to have a look at which shows how to contribute (install pre-commit and run it locally together with pytest to speed up the workflow). |
Thanks for pointers. Worked through all the errors related to my changes. There are two remaining errors that seem unrelated to my PR. See below: |
Would you be willing to turn the gist into a short tutorial? You can place it here |
|
The code looks great! Also, it would be nice if you could have a (small) test in which you compare the two different approaches (or only the newly added one). This can be added here. |
…ue to min_samples == 1
|
Will review somewhere today or tomorrow. Nice work, thanks! |
There was a problem hiding this comment.
A small typo :
docs/tutorials/nb_shap_variance_penalty_and_results_comparrison.ipynb > docs/tutorials/nb_shap_variance_penalty_and_results_comparison.ipynb
|
Made those changes @ReinierKoops. |
|
Maybe it’s running on a different version of scikit learn? Would it be possible to output all the parameters of the algo that you have for the test? Also implicit ones. |
|
Old version of scikit-learn was the issue. Tests passing my side now. |
|
Awesome, happy it’s confirmed where the problem lies! |
|
Thanks again, your pr’s are much appreciated! |
|
Very welcome. Thank you and team. |
PR addresses #216
Overall objective: Add penalty to features that has high variance in underlying shap values - when computing feature importance. This will (in theory) encourage selection of features that have more coherency across CV folds.
API design:
shap_variance_penalty_factorused to control enabling/disabling in addition to controlling the amount of penalty added to the mean(shap) computation.penalized_shap_abs_mean = (mean_shap - (std_shap * shap_variance_penalty_factor))shap_variance_penalty_factor= None (default docstring)shap_variance_penalty_factor= 0 (same as None, no penalty)shap_variance_penalty_factor= 1 (one standard deviation of mean penalty)Work tasks:
shap_variance_penalty_factorparameter toShapRFECV.fit,compute&fit_computeEarlyStoppingShapRFECV.fit,compute&fit_computeshap_variance_penalty_factortoshap_helpers.calculate_shap_importanceShapModelInterpreter.fit,compute&fit_computeshap_variance_penalty_factorfeatureReviewers: LMK any changes / improvements that can be made.