Add model calibration #235

gbeane · 2025-10-17T21:24:20Z

Overview: Probability Calibration via CalibratedClassifierCV

What it is
CalibratedClassifierCV wraps any classifier (RF/GBT/XGBoost, etc.) and learns a mapping from raw model scores to well-calibrated probabilities. It does this with an internal cross-validation loop: in each fold, it fits the base model on train-fold data and learns a calibration function on the fold’s held-out data. Two calibration methods are supported:

isotonic (non-parametric, flexible; best with enough data)
sigmoid (Platt scaling; smoother, works better on smaller data)

Why calibration matters
Tree models often output overconfident probabilities (lots of ~0.0 or ~1.0). Calibration fixes this so that predicted probabilities reflect reality (e.g., among samples with p≈0.7, ~70% are positive). Better calibration improves:

Thresholding
Loss-based metrics: log-loss / Brier score reflect actual probability quality.
User trust: fewer “certain-but-wrong” predictions in JABS.

How we use it in JABS

Added optional settings, which are saved in the project.json file: calibrate_probabilities: bool, calibration_method: "isotonic"|"sigmoid", calibration_cv: int.
During training (including LOGO cross-validation), probability calibration is fit separately inside each fold. The calibrator is trained only on that fold’s training data and never sees the validation data, which prevents data leakage and keeps validation metrics honest.
For feature importance, when calibrated we aggregate importances across the calibrated folds’ base estimators
UI: a JABS Settings dialog lets users toggle calibration and choose the method/CV. The settings are persisted in the JABS project.json file.

Practical guidance

Reasonable Default: calibrate_probabilities=True, method="isotonic", calibration_cv=3. (however, this PR does not change current behavior, so calibrate_probabilities defaults to False)
Use "sigmoid" if folds are small; isotonic needs more data.
Avoid very high calibration_cv—typically 3–5 is enough.
Always calibrate during validation if you’ll deploy a calibrated final model (metrics should reflect deployed classifier).

Trade-offs

Extra compute (fits base model multiple times).
Slight variance increase; mitigated by sensible CV (3–5).
For extremely imbalanced or tiny folds, prefer "sigmoid" or reduce CV.

Net impact for JABS

More honest probabilities -> cleaner threshold selection for behaviors / improved search for low-confidence predictions.
Better UX and trust in “confidence” displays.
Fewer brittle 0/1 outputs; improved stability across datasets and sessions.

See Also

User Guide

I'm going to hold off on updating the user guide until I'm sure we're going to merge these changes

Settings Dialog Screenshots

…ation

ptuan5

Changes in the scripts overall look good to me.
Cannot comment on the changes to the ui part because it's beyond my comprehension.

ptuan5 · 2025-11-12T15:10:21Z

src/jabs/classifier/classifier.py

            with warnings.catch_warnings():
                warnings.simplefilter("ignore", category=FutureWarning)
-                self._classifier = self._fit_xgboost(features, labels, random_seed=random_seed)
+                self._classifier.fit(self._clean_features_for_training(features), labels)


you use clean_features here for the calibrated branch, but not for the old, original branch?

ptuan5 · 2025-11-12T15:40:45Z

src/jabs/classifier/classifier.py

+
+        Args:
+            truth: Binary ground truth labels (0 or 1).
+            probabilities: Predicted probabilities (2D array where second column is positive class).


for brier_score, accepting both 1D and 2D arrays, but here you only accept 2D arrays. Is there a reason for this? Should we enforce 2D altogether?

ptuan5 · 2025-11-12T15:44:23Z

src/jabs/classifier/classifier.py

+        if pos == 0 or neg == 0:
+            warnings.warn(
+                "plot_reliability: need both positive and negative labels.", stacklevel=2
+            )


if we need both positive and negative for this, should we either raise an error or return early?

gbeane added 6 commits October 15, 2025 23:33

initial implementation of model probability calibration

500cd59

fix some Qt flag usage

e4041d0

some refactoring in classifier.py

76b3bdf

refactoring in JabsSettingsDialog

3fe7536

save the probability calibration settings in exported training data

0fe5d90

finalizing jabs settings dialog and classifier probability calibration

4b9d258

gbeane requested review from SkepticRaven and keithshep October 17, 2025 21:27

edit help text

34cf9eb

gbeane marked this pull request as draft October 17, 2025 21:42

gbeane added 7 commits October 17, 2025 21:14

fix unittests

d56f9c7

fix typo in docstring

745288d

update inline help text

876f482

compute Brier score, avoid calling both predict and predict_proba

55cf094

add auto calibration method

96e6819

add option for saving reliability plots; other tweaks to model calibr…

64cd3be

…ation

change default for Project enable_session_tracking param

e318321

gbeane self-assigned this Nov 7, 2025

gbeane requested review from bergsalex and ptuan5 November 7, 2025 14:58

Merge branch 'main' into add-model-calibration

79608e8

ptuan5 approved these changes Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add model calibration #235

Add model calibration #235

Uh oh!

gbeane commented Oct 17, 2025 •

edited

Loading

Uh oh!

ptuan5 left a comment

Uh oh!

ptuan5 Nov 12, 2025

Uh oh!

ptuan5 Nov 12, 2025

Uh oh!

ptuan5 Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add model calibration #235

Are you sure you want to change the base?

Add model calibration #235

Uh oh!

Conversation

gbeane commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview: Probability Calibration via CalibratedClassifierCV

User Guide

Settings Dialog Screenshots

Uh oh!

ptuan5 left a comment

Choose a reason for hiding this comment

Uh oh!

ptuan5 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ptuan5 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ptuan5 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gbeane commented Oct 17, 2025 •

edited

Loading