Skip to content

Commit

Permalink
Merge pull request #34 from UChicagoSUPERgroup/counterfactual_changes
Browse files Browse the repository at this point in the history
Counterfactual Changes
  • Loading branch information
cmdkev authored Aug 18, 2022
2 parents f4fc866 + ff8f573 commit 091a54b
Show file tree
Hide file tree
Showing 6 changed files with 248 additions and 69 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ export class ProtectedColumnNote extends PopupNotification {

private _data: ProtectedData[];
private _dfs: string[];
private _notices: { [key: string]: any };

////////////////////////////////////////////////////////////
// Constructor
Expand All @@ -24,11 +25,17 @@ export class ProtectedColumnNote extends PopupNotification {
super("protected", true, "Protected Columns", originalMessage);
this._data = [];
this._dfs = [];
this._notices = originalMessage;
notices.reverse();
console.log(notices);
this._notices.sort(
(dfA: { [key: string]: any }, dfB: { [key: string]: any }) =>
this._countNoticeSensitivityLength(dfB) -
this._countNoticeSensitivityLength(dfA)
);
for (var x = 0; x < notices.length; x++) {
this._data.push(new ProtectedData(notices[x], kernel_id));
this._dfs.push(notices[x]["df"]);
this._notices[notices[x]["df"]] = notices[x];
}
super.addRawHtmlElement(this._generateBaseNote(this._dfs));
}
Expand All @@ -38,6 +45,15 @@ export class ProtectedColumnNote extends PopupNotification {
// Low-level functions that tend to be repeated often
////////////////////////////////////////////////////////////

private _countNoticeSensitivityLength(df: {
[key: string]: { [key: string]: any };
}): number {
return Object.entries(df.columns).reduce(
(sum, [key, entry]) => (sum += entry.sensitive ? 1 : 0),
0
);
}

private _generateBaseNote(dfs: any[]): HTMLElement {
var dfString = "";
console.log(dfs);
Expand All @@ -58,15 +74,20 @@ export class ProtectedColumnNote extends PopupNotification {
</div>
</div>`;
}
const joinedColumnNames = dfs.join(", ");
const joinedColumnNames = dfs
.filter(
(dfName) =>
this._countNoticeSensitivityLength(this._notices[dfName]) >= 1
)
.join(", ");
var elem = $.parseHTML(`
<div class="promptMl protectedColumns">
<h1>Protected Columns</h1>
<div class="intro">
<p>Some of the columns in the <strong>${joinedColumnNames}</strong> dataframe feature protected classes of data.
A protected class is group of people sharing a common trait who are legally
protected from being discriminated against on the basis of that trait.
Some examples include race, gender, and pregnancy status (please refer to the Welcome tab for the full list of protected columns).
Some examples include race, gender, and pregnancy status.
<br /><br /><strong>Why should I be concerned?</strong>
<br />When you are building machine learning models off of data that includes
information from protected classes, you may be inadvertently replicating power
Expand Down
103 changes: 67 additions & 36 deletions jupyter_note_book_plugin/prompt-ml/src/components/UncertaintyNote.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ export class UncertaintyNote extends PopupNotification {
<h1>Counterfactual</h1>
<p>
A counterfactual is a conditional statement used to reason about what could
have been true under different circumstances. "If Brenda was allergic to apples
she would not eat an apple every day", is an example of a counterfactual
have been true under different circumstances. "If Brenda were allergic to apples
she would not eat an apple every day" is an example of a counterfactual
statement. In the machine learning setting, a counterfactual means perturbing
each data point and observing how this impacts the predictions made by a model.
</p>
Expand All @@ -50,21 +50,23 @@ export class UncertaintyNote extends PopupNotification {
<p>
<b>Why should I be concerned?</b> <br />
When you are building a machine learning model, it is sometimes difficult to
know how a model will perform on out-of-distribution data (i.e. data instances
know how a model will perform on out-of-distribution data (i.e., data instances
it has not previously seen). Consequently, it may be difficult to solely use
the accuracy of a model as a measure. This uncertainty is a result of the
inability to know the degree to which your model has "generalized" and
"learned" from the data.
</p>
<br />
<b>What can I do about it?</b> <br />
Retrograde offers brief statistics about the number of predictions affected by
the perturbations as well as how the predictions were affected (i.e.
quantifying the changes from True to False or False to True). The statistics
from this modification are summarized above the table. It is up to you to
interpret and evaluate the ramifications of the counterfactual predictions
presented to see if there exist systemic issues or outliers with the results of
the counterfactual predictions.
Retrograde provides brief statistics about the number of predictions affected by
the perturbations as well as how the predictions were affected (i.e.,
quantifying the changes from True to False or False to True). Use the dropdown
menu to select which modified column or combination of columns to view in the
table. Once a selection is made, the table below the summary will show the
original data side-by-side with the modified column(s) in orange.
It is up to you to interpret and evaluate the ramifications of the counterfactual
predictions presented to determine if there exist systemic issues or outliers with
the results of the counterfactual predictions.
</p>
<div class="models">
</div>
Expand All @@ -81,32 +83,35 @@ export class UncertaintyNote extends PopupNotification {
model.columns[colName].sort();
}
var elem = $.parseHTML(`
<div class="noselect model shadowDefault" prompt-ml-tracking-enabled prompt-ml-tracker-interaction-description="Toggled model tab (${model["model_name"]})">
<h2><span class="prefix"> - </span>Within <span class="code-snippet">${model["model_name"]}</span> Base Accuracy: ${model["original_accuracy"]}}</h2>
<div class="noselect model shadowDefault" prompt-ml-tracking-enabled prompt-ml-tracker-interaction-description="Toggled model tab (${
model["model_name"]
})">
<h2><span class="prefix"> - </span>Within <span class="code-snippet">${
model["model_name"]
}</span> Original Accuracy: ${UncertaintyNote._r(
model["original_accuracy"] * 100,
1
)}%</h2>
<div class="toggleable"></div>
</div>
`);
// handle expanded / condensed views
$(elem)
.find("h2")
.on("mouseup", (e: Event) => {
.find("h2")[0]
.onmouseup = (e: Event) => {
var parentElem = $($(e.currentTarget).parent());
// Toggle basic visibility effects
parentElem.toggleClass("condensed");
// Change prefix to "+" or "-" depending on if the note is condensed
parentElem
.find("h2 .prefix")
.text(parentElem.hasClass("condensed") ? " + " : " - ");
});
};
$(elem).find(".toggleable").append(this._generateSummary(model));
// generate table
$(elem)
.find(".toggleable")
.append(
$.parseHTML(
`<div class="tableContainer"><h4>Modifications Table</h4></div>`
)
);
.append($.parseHTML(`<div class="tableContainer"></div>`));
$(elem)
.find(".toggleable .tableContainer")
.append(this._generateTable(model));
Expand All @@ -121,7 +126,13 @@ export class UncertaintyNote extends PopupNotification {
.append(this._generateTable(model));
};
// generate interactivity
$(elem).find(".toggleable").prepend(this._generateSelector(model, onPress));
$(elem)
.find(".toggleable .tableContainer")
.prepend(this._generateSelector(model, onPress));
// generate title
const tableHeader = document.createElement("h4");
tableHeader.innerText = "Modifications Table";
$(elem).find(".toggleable .tableContainer").prepend(tableHeader);
return elem[1] as HTMLElement;
}

Expand All @@ -136,9 +147,9 @@ export class UncertaintyNote extends PopupNotification {
for (var columnName of Object.keys(model.modified_values)
.sort()
.filter((columnName) => model.ctf_statistics[columnName].raw_diff > 0)) {
const displayName = model.ctf_statistics[columnName].info
.flat()
.join(", ");
const displayName = Array.isArray(model.ctf_statistics[columnName].info)
? model.ctf_statistics[columnName].info.flat().join(", ")
: model.ctf_statistics[columnName].info;
$(elem)
.find(".options")
.append(
Expand All @@ -147,13 +158,13 @@ export class UncertaintyNote extends PopupNotification {
)
);
}
$(elem).on("mouseup", (e: Event) => {
(elem[0] as HTMLElement).onmouseup = (e: Event) => {
if ($(e.target).prop("id")) {
$(elem).find(".selected p").text($(e.target).attr("displayname"));
onSelect($(e.target).prop("id"));
}
$(elem).toggleClass("active");
});
};
return elem[0] as any as HTMLElement;
}

Expand All @@ -173,21 +184,34 @@ export class UncertaintyNote extends PopupNotification {
$.parseHTML(
`<li><strong>${UncertaintyNote._r(
colStats["accuracy"][0] * 100,
3
)}</strong>% of predictions were accurate</li>`
1
)}%</strong> of predictions were accurate <strong>(${UncertaintyNote._r(
model["original_accuracy"] * 100,
1
)}% in original model)</strong></li>`
)
);
$(colSummary[1]).append(
$.parseHTML(
`<li><strong>${colStats["raw_diff"]}</strong> predictions changed</li><ul class="predictionsSublist"></ul>`
`<li><strong>${colStats["raw_diff"]} (${UncertaintyNote._r(
(colStats["raw_diff"] / colStats["total"]) * 100,
1
)}%)</strong> predictions changed</li>
<ul class="predictionsSublist"></ul>`
)
);
$(colSummary[1])
.find(".predictionsSublist")
.append(
$.parseHTML(
`<li><strong>${colStats["true_to_False"]}</strong> changed from <strong>True</strong> to <strong>False</strong></li>
<li><strong>${colStats["false_to_True"]}</strong> changed from <strong>False</strong> to <strong>True</strong></li>`
`<li><strong>${colStats["true_to_False"]} (${UncertaintyNote._r(
(colStats["true_to_False"] / colStats["total"]) * 100,
1
)}%)</strong> changed from <strong>True</strong> to <strong>False</strong></li>
<li><strong>${colStats["false_to_True"]} (${UncertaintyNote._r(
(colStats["false_to_True"] / colStats["total"]) * 100,
1
)}%)</strong> changed from <strong>False</strong> to <strong>True</strong></li>`
)
);
$(elem).find("ul.modificationSummaries").append(colSummary);
Expand Down Expand Up @@ -232,7 +256,7 @@ export class UncertaintyNote extends PopupNotification {
)
);
// prediction
row.appendChild(
row.prepend(
UncertaintyNote._generateCell(
x == 0
? "prediction"
Expand All @@ -252,12 +276,16 @@ export class UncertaintyNote extends PopupNotification {
? modifiedIndices.indexOf(y - 1) >= 0
? rowData[y - 1]
: cellData
: UncertaintyNote._r(cellData, 3),
: typeof cellData == "number"
? UncertaintyNote._r(cellData, 3)
: cellData,
x == 0,
x != 0 &&
modifiedIndices.indexOf(y - 1) >= 0 &&
rowData[y - 1] != rowData[y]
? UncertaintyNote._r(rowData[y - 1], 3)
? typeof rowData == "number"
? UncertaintyNote._r(rowData[y - 1], 3)
: rowData[y - 1]
: null,
x == 0 && modifiedIndices.indexOf(y - 1) >= 0
);
Expand Down Expand Up @@ -291,7 +319,8 @@ export class UncertaintyNote extends PopupNotification {
${modified || overrideModifiedStyling ? "class='modified'" : ""} >
${
modified
? `<span class="new">${content}</span><span class="old">${modified}</span>`
? // ? `<span class="new">${content}</span><span class="old">${modified}</span>`
`<span class="old">${modified}</span>➜<span class="new">${content}</span>`
: content
}
Expand All @@ -303,7 +332,9 @@ export class UncertaintyNote extends PopupNotification {
model: { [key: string]: any },
columnString: string
): string {
return model.ctf_statistics[columnString].info.flat().join(", ");
return Array.isArray(model.ctf_statistics[columnString].info)
? model.ctf_statistics[columnString].info.flat().join(", ")
: model.ctf_statistics[columnString].info;
}

////////////////////////////////////////////////////////////
Expand Down
10 changes: 5 additions & 5 deletions jupyter_note_book_plugin/prompt-ml/src/notifier.ts
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ export class Prompter {
console.log("eqodds length ",eqOdds.length);
eqOdds.reverse();
// preamble on MRN
note.addParagraph(`<p><b>The Model Report</b> uses the sensitivity as marked in the Protected Column notification to determine
note.addParagraph(`<p>The <span class="code-snippet-inline">Model Report</span> uses the sensitivity as marked in the <span class="code-snippet-inline">Protected Column</span> notification to determine
the columns that will be considered in this model report. It is like a report card created by Retrograde that
measures your model's performance across groups you may have excluded in your test features. Retrograde does this in part
by parsing your code and finding the original dataframe your test dataframe was derived from, as well as that
Expand Down Expand Up @@ -268,7 +268,7 @@ export class Prompter {
// Attaching the data to the note itself
note.addRawHtmlElement(new Model(name, model["current_df"], model["ancestor_df"], groups).export())
}
note.addParagraph(`<p>This plugin has calculated performance metrics for data subsets based on Protected Columns</p>`);
note.addParagraph(`<p>Retrograde has calculated performance metrics for data subsets based on <span class="code-snippet-inline">Protected Columns</span></p>`);
note.addParagraph(`<br /><p><b>Why it matters</b> Overall accuracy of a model may not tell the whole story.
A model may be accurate overall, but may have better or worse performance on particular data subsets.
Alternatively, errors of one type may be more frequent within one subset, and errors of another type may be more frequent in a different data subset.</p>`);
Expand All @@ -278,8 +278,8 @@ export class Prompter {
Exploring the whole space may not be feasible, so prioritizing certain performance metrics and groups, and characterizing the tradeoffs there may be most efficient.</p>`);
note.addParagraph(`<br /><p><b>How was it detected?</b> The performance metrics shown here are derived from Retrograde's best guess at the protected columns associated with the model's testing data.
Because of this they may not perfectly match a manual evaluation.
Retrograde calculates the performance with respect to protected groups identified in the Protected Column notification.
Retrograde calculates precision, recall, F1 Score, false positive rate (FPR) and false negative rate (FNR). More information about these metrics can be found <a style="color:blue" href="https://towardsdatascience.com/performance-metrics-confusion-matrix-precision-recall-and-f1-score-a8fe076a2262">here</a>.</p>`);
Retrograde calculates the performance with respect to protected groups identified in <span class="code-snippet-inline">Protected Columns</span>.
Retrograde calculates precision, recall, F1 Score, false positive rate (FPR) and false negative rate (FNR). More information about these metrics can be found <a style="color:blue" href="https://towardsdatascience.com/performance-metrics-confusion-matrix-precision-recall-and-f1-score-a8fe076a2262">here</a></p>`);
// Send to the Jupyterlab interface to render
var message = note.generateFormattedOutput();
this._appendNote(message);
Expand Down Expand Up @@ -503,7 +503,7 @@ export class Prompter {

note.addParagraph(`<br /><b>How was it detected?</b> Retrograde calculates missing data values by examining the all columns with na values.
This means that placeholder values not recognized by <code>pd.isna()</code> are not recognized.
The Missing Data notification uses the protected columns identified in the Protected Column notification and checks the most common sensitive data value when an entry is missing.
The <span class="code-snippet-inline">Missing Data</span> notification uses the protected columns identified in the <span class="code-snippet-inline">Protected Column</span> notification and checks the most common sensitive data value when an entry is missing.
It does not check combinations of columns.`);
// Create container for the small-view content
// Iterating over every dataframe
Expand Down
15 changes: 13 additions & 2 deletions jupyter_note_book_plugin/prompt-ml/style/index.css
Original file line number Diff line number Diff line change
Expand Up @@ -686,25 +686,36 @@ th {

.uncertaintyNote .old {
text-decoration: line-through;
float: right;
float: left;
font-size: 0.8em;
padding-right: 2px;
}

.uncertaintyNote .new {
font-weight: 600;
padding-right: 15px;
padding-left: 5px;
}

.uncertaintyNote table td {
max-width: 100px;
word-wrap: wrap;
white-space: normal;
padding-right: 30px;
padding-left: 8px;
}

.uncertaintyNote .modified {
background-color: #f1c56b;
width: auto;
}

.uncertaintyNote .summary h4 {
margin: 0;
}

.uncertaintyNote tr td:first-of-type,
.uncertaintyNote tr th:first-of-type {
border-right: 3px solid black;
padding-right: 30px;
background-color: #74aaf2;
}
Loading

0 comments on commit 091a54b

Please sign in to comment.