Skip to content

Base Score - Wrong scale #2

@jmonteroers

Description

@jmonteroers

First of all, thank you for providing such an exciting utility.

In the code, the base score is assumed to be in the logit scale (for example, when defining tree_leafs in XGBScorecardConstructor.construct_scorecard). However, the following MRE shows that the base score seems to be instead in the probability scale:

import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from xgboost import DMatrix
from xbooster.constructor import XGBScorecardConstructor

import numpy as np
from scipy.special import expit


data = load_breast_cancer()
X, y = data.data, data.target

# Build and fit classifier with only two trees
model = xgb.XGBClassifier(n_estimators=2, eval_metric='logloss')
model.fit(X, y)


# Retrieve base score using constructor class
scorecard_constructor = XGBScorecardConstructor(
    model, X, y
)
bscore = scorecard_constructor.base_score

# Return scores using individual trees
individual_preds = [tree.predict(DMatrix(X), output_margin=True) for tree in model.get_booster()]
# hacky way to get base score
# base_score_matrix = (base score + score_T1)+ (base_score + score_T2) - (base_score + score_T1 + score_T2) = base_score 
# NOTE: this must be in the logit scale
base_score_matrix = sum(individual_preds) - model.predict(X, output_margin=True)
# Transform into probability scale
base_score_matrix = expit(base_score_matrix)

# Assertion
check_arr = np.isclose(base_score_matrix, bscore)
assert np.all(check_arr), "Base score not in probability scale"

The trick is to use two trees only to have the base score repeated twice in their individual predictions, and then substract the actual prediction in the raw score scale (logit) to identify the actual raw score being used, again in the logit scale. Then I compare it against the base score used in the XGBScorecardConstructor class. They only match after transforming the former into the probability scale, that is why the assertion does not fail. I would advise to add this idea as a test to the test suite at some point.

Happy to help if this indeed needs some fixing 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requestedwontfixThis will not be worked on

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions