Extend primary-metric sort to scatters and heatmap coloring by peterkirgis · Pull Request #23 · petergpt/bullshit-benchmark

peterkirgis · 2026-04-22T20:44:31Z

Follow up to PR #22

Propagates the Clear Pushback / Accepted Nonsense toggle through the rest of the main-view figures so the whole page stays internally consistent when the metric flips.

Release-date scatter, release-date-by-org trend chart, reasoning (tokens/cost) scatter, and the two size scatters now plot the active metric on the Y axis. The axis is inverted for Accepted Nonsense so the strongest model stays at the top, matching the leaderboard.
Label-picking, trend lines, tooltips, and axis labels all read from the active metric rather than hardcoding greenRate.
Domain heatmap cells show the active metric's rate; the ramp is driven by "performance" (higher = better) so green cells still mean "good" in either mode.
"Detection Rate by Domain" / "Detection Rate Over Time" section headings and every scatter subtitle swap wording based on the selected metric.

I still left the default as Clear Pushback. I understand this is probably more intuitive for the audience, but I think ultimately it would be better to have the default by Accepted Nonsense, even though that would make some of the plots less intuitive.

Adds a "Sort by" selector in the sticky domain scope bar letting users rank the main-view leaderboards by Clear Pushback (default) or Accepted Nonsense. The chosen metric drives the stacked bar chart, domain heatmap row order, leaderboard baseline/tiebreakers, and the compare-view question dropdown. "Strongest model wins" framing is preserved in both modes: green sorts descending, red sorts ascending, so the best model always lands at the top. The leaderboard column headers remain clickable for ad-hoc overrides; switching the primary metric resets that override to the new default.

Propagates the Clear Pushback / Accepted Nonsense toggle added in the prior PR through the rest of the main-view figures: - Release-date scatter, release-date-by-org trend chart, reasoning (tokens/cost) scatter, and the two size scatters now plot the active metric on the Y axis. The axis is inverted for Accepted Nonsense so the strongest model stays at the top, preserving the leaderboard's "best at top" reading. - Label-picking, trend lines, tooltips, and scatter axis labels all read from the active metric rather than hardcoding greenRate. - Domain heatmap cells show the active metric's rate; the color ramp is driven by "performance" (higher = better) so green cells still mean "good" in either mode. - "Detection Rate by Domain", "Detection Rate Over Time", and every scatter subtitle now swap wording based on the selected metric. Left for a follow-up: the "Detection Rate by Technique" chart still ranks/colors by green rate only — flipping its bars needs a matching sort change that's worth reviewing on its own.

peterkirgis added 2 commits April 22, 2026 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend primary-metric sort to scatters and heatmap coloring#23

Extend primary-metric sort to scatters and heatmap coloring#23
peterkirgis wants to merge 2 commits into
petergpt:mainfrom
peterkirgis:feature/primary-metric-scatters

peterkirgis commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peterkirgis commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant