Releases: oegedijk/explainerdashboard
Releases · oegedijk/explainerdashboard
v0.3.5: skorch support, simplified dashboard and feature descriptions
New Features
- adds support for
PyTorch
Neural Networks! (as long as they are wrapped byskorch
) - adds
SimplifiedClassifierComposite
andSimplifiedRegressionComposite
toexplainerdashboard.custom
- adds flag
simple=True
to load these simplified one page dashboards:ExplainerDashboard(explainer, simple=True)
- adds support for visualizing trees of
ExtraTreesClassifier
andExtraTreesRegressor
- adds
FeatureDescriptionsComponent
toexplainerdashboard.custom
and the Importances tab - adds possibility to dynamically add new dashboards to running ExplainerHub using
/add_dashboard
route
withadd_dashboard_route=True
(will only work if you're running the Hub as a single worker/node though!)
Improvements
ExplainerDashboard.to_yaml("dashboards/dashboard.yaml", dump_explainer=True)
will now dump the explainer in the correct subdirectory (and also default
to explainer.joblib)- Interactions tab automatically excluded for linear models
v0.3.4.1: fixes detailed shap plots bug when cats=None
Fixes dtreeviz 1.3 breaking change bug
Release Notes
Version 0.3.4:
Bug Fixes
- Fixes incompatibility bug with dtreeviz >= 1.3
- Fixes ExplainerHub dbc.Jumbotron style bug
Improvements
- raises ValueError when passing
shap='deep'
as it is not yet correctly supported
v0.3.3.1: minor bugfix with outliers and nan
Fixes a bug with removing outliers when nan's are present.
v0.3.3: better pipeline support and thread safety
Version 0.3.3:
Highlights:
- Adding support for cross validated metrics
- Better support for pipelines by using kernel explainer
- Making explainer threadsafe by adding locks
- Remove outliers from shap dependence plots
Breaking Changes
- parameter
permutation_cv
has been deprecated and replaced by parametercv
which
now also works to calculate cross-validated metrics besides cross-validated
permutation importances.
New Features
- metrics now get calculated with cross validation over
X
when you pass the
cv
parameter to the explainer, this is useful when for some reason you
want to pass the training set to the explainer. - adds winsorization to shap dependence and shap interaction plots
- If
shap='guess'
fails (unable to guess the right type of shap explainer),
then default to the model agnosticshap='kernel'
. - Better support for sklearn
Pipelines
: if not able to extract transformer+model,
then default toshap.KernelExplainer
to explain the entire pipeline - you can now remove outliers from shap dependence/interaction plots with
remove_outliers=True
: filters all outliers beyond 1.5*IQR
Bug Fixes
- Sets proper
threading.Locks
before making calls to shap explainer to prevent race
conditions with dashboards calling for shap values in multiple threads.
(shap is unfortunately not threadsafe)
Improvements
- single shap row KernelExplainer calculations now go without tqdm progress bar
- added cutoff tpr anf fpr to roc auc plot
- added cutoff precision and recall to pr auc plot
- put a loading spinner on shap contrib table
v0.3.2.2: more bugfixes
Version 0.3.2.2:
index_dropdown=False
now works for indexes not listed in set_index_list_func()
as long as it can be found by set_index_exists_func
New Features
- adds
set_index_exists_func
to add function that checks for index existing
besides those listed byset_index_list_func()
Bug Fixes
- bug fix to make
shap.KernelExplainer
(used with explainer parametershap='kernel'
)
work withRegressionExplainer
- bug fix when no explicit
labels
are passed with index selector - component only update if
explainer.index_exists()
: noIndexNotFoundErrors
anymore. - fixed title for regression index selector labeled 'Custom' bug
get_y()
now returns.item()
when necessary- removed ticks from confusion matrix plot when no
labels
param passed
(this bug got reintroduced in recent plotly release)
Improvements
- new helper function
get_shap_row(index)
to calculate or look up a single
row of shap values.
v0.3.2.1: add index_dropdown=False to regression dashboard
Bugfix: new index_dropdown=False
feature was not working correctly for regression dashboards
v0.3.2: custom metrics
Version 0.3.2:
Highlights:
- Control what metrics to show or use your own custom metrics using
show_metrics
- Set the naming for onehot features with all
0
s withcats_notencoded
- Speed up plots by displaying only a random sample of markers in scatter plots with
plot_sample
. - make index selection a free text field with
index_dropdown=False
New Features
- new parameter
show_metrics
for bothexplainer.metrics()
,ClassifierModelSummaryComponent
andRegressionModelSummaryComponent
:- pass a list of metrics and only display those metrics in that order
- you can also pass custom scoring functions as long as they
are of the formmetric_func(y_true, y_pred)
:show_metrics=[metric_func]
- For
ClassifierExplainer
what is passed to the custom metric function
depends on whether the function takes additional parameterscutoff
andpos_label
. If these are not arguments, theny_true=self.y_binary(pos_label)
andy_pred=np.where(self.pred_probas(pos_label)>cutoff, 1, 0)
.
Else the rawself.y
andself.pred_probas
are passed for the
custom metric function to do something with. - custom functions are also stored to
dashboard.yaml
and imported upon
loadingExplainerDashboard.from_config()
- For
- new parameter
cats_notencoded
: a dict to indicate how to name the value
of a onehotencoded features when all onehot columns equal 0. Defaults
to'NOT_ENCODED'
, but can be adjusted with this parameter. E.g.
cats_notencoded=dict(Deck="Deck not known")
. - new parameter
plot_sample
to only plot a random sample in the various
scatter plots. When you have a large dataset, this may significantly
speed up various plots without sacrificing much in expressiveness:
ExplainerDashboard(explainer, plot_sample=1000).run
- new parameter
index_dropdown=False
will replace the index dropdowns with a
free text field. This can be useful when you have a lot of potential indexes,
and the user is expected to know the index string.
Input will be checked for validity withexplainer.index_exists(index)
,
and field indicates when input index does not exist. If index does not exist,
will not be forwarded to other components, unless you also setindex_check=False
. - adds mean absolute percentage error to the regression metrics. If it is too
large a warning will be printed. Can be excluded with the newshow_metrics
parameter.
Bug Fixes
get_classification_df
added toClassificationComponent
dependencies.
Improvements
- accepting single column
pd.Dataframe
fory
, and automatically converting
it to apd.Series
- if WhatIf
FeatureInputComponent
detects the presence of missing onehot features
(i.e. rows where all columns of the onehotencoded feature equal 0), then
adds'NOT_ENCODED'
or the matching value fromcats_notencoded
to the
dropdown options. - Generating
name
for parameters forExplainerComponents
for which no
name is given is now done with a determinative process instead of a random
uuid
. This should help with scaling custom dashboards across cluster
deployments. Also dropsshortuuid
dependency. ExplainerDashboard
now prints out local ip address when starting dashboard.get_index_list()
is only called once upon starting dashboard.
v0.3.1: responsive classifier components
Version 0.3.1:
This version is mostly about pre-calculating and optimizing the classifier statistics
components. Those components should now be much more responsive with large datasets.
New Features
- new methods
roc_auc_curve(pos_label)
andpr_auc_curve(pos_label)
- new method
get_classification_df(...)
to get dataframe with number of labels
above and below a given cutoff.- this now gets used by
plot_classification(..)
- this now gets used by
- new method
confusion_matrix(cutoff, binary, pos_label)
- added parameters
sort_features
toFeatureInputComponent
:- defaults to
'shap'
: order features by mean absolute shap - if set to
'alphabet'
features are sorted alphabetically
- defaults to
- added parameter
fill_row_first
toFeatureInputComponent
:- defaults to
True
: fill first row first, then next row, etc - if False: fill first column first, then second column, etc
- defaults to
Bug Fixes
- categorical mappings now updateable with pandas<=1.2 and python==3.6
- title now overridable for
RegressionRandomIndexComponent
- added assert check on
summary_type
forShapSummaryComponent
Improvements
- pre-Calculating lift_curve_df only once and then storing for each pos_label
- plus: storing only 100 evenly spaced rows of lift_curve_df
- dashboard should be more responsive for large datasets
- pre-calculating roc_auc_curve and pr_auc_curve
- dashboard should be more responsive for large datasets
- pre-calculating confusion matrices
- dashboard should be more responsive for large datasets
- pre-calculating classification_dfs
- dashboard should be more responsive for large datasets
- confusion matrix: added axis title, moved predicted labels to bottom of graph
- precision plot component: when only adjusting cutoff, simply updating the cutoff
line, without recalculating the plot.
v0.3.0.1: dependency fixes
version 0.3.0.1:
Some of the new features of version 0.3
only work with pandas>=1.2
, which is not available for python 3.6
.
Breaking Changes
- new dependency requirements
pandas>=1.2
also impliespython>=3.7
Bug Fixes
- updates
pandas
version to be compatible with categorical feature operations - updates dtreeviz version to make
xgboost
andpyspark
dependencies optional