Skip to content

Extend primary-metric sort to scatters and heatmap coloring#23

Open
peterkirgis wants to merge 2 commits into
petergpt:mainfrom
peterkirgis:feature/primary-metric-scatters
Open

Extend primary-metric sort to scatters and heatmap coloring#23
peterkirgis wants to merge 2 commits into
petergpt:mainfrom
peterkirgis:feature/primary-metric-scatters

Conversation

@peterkirgis
Copy link
Copy Markdown

Follow up to PR #22

Propagates the Clear Pushback / Accepted Nonsense toggle through the rest of the main-view figures so the whole page stays internally consistent when the metric flips.

  • Release-date scatter, release-date-by-org trend chart, reasoning (tokens/cost) scatter, and the two size scatters now plot the active metric on the Y axis. The axis is inverted for Accepted Nonsense so the strongest model stays at the top, matching the leaderboard.
  • Label-picking, trend lines, tooltips, and axis labels all read from the active metric rather than hardcoding greenRate.
  • Domain heatmap cells show the active metric's rate; the ramp is driven by "performance" (higher = better) so green cells still mean "good" in either mode.
  • "Detection Rate by Domain" / "Detection Rate Over Time" section headings and every scatter subtitle swap wording based on the selected metric.

I still left the default as Clear Pushback. I understand this is probably more intuitive for the audience, but I think ultimately it would be better to have the default by Accepted Nonsense, even though that would make some of the plots less intuitive.

Adds a "Sort by" selector in the sticky domain scope bar letting users
rank the main-view leaderboards by Clear Pushback (default) or Accepted
Nonsense. The chosen metric drives the stacked bar chart, domain heatmap
row order, leaderboard baseline/tiebreakers, and the compare-view
question dropdown. "Strongest model wins" framing is preserved in both
modes: green sorts descending, red sorts ascending, so the best model
always lands at the top. The leaderboard column headers remain
clickable for ad-hoc overrides; switching the primary metric resets
that override to the new default.
Propagates the Clear Pushback / Accepted Nonsense toggle added in the
prior PR through the rest of the main-view figures:

- Release-date scatter, release-date-by-org trend chart, reasoning
  (tokens/cost) scatter, and the two size scatters now plot the active
  metric on the Y axis. The axis is inverted for Accepted Nonsense so
  the strongest model stays at the top, preserving the leaderboard's
  "best at top" reading.
- Label-picking, trend lines, tooltips, and scatter axis labels all
  read from the active metric rather than hardcoding greenRate.
- Domain heatmap cells show the active metric's rate; the color ramp
  is driven by "performance" (higher = better) so green cells still
  mean "good" in either mode.
- "Detection Rate by Domain", "Detection Rate Over Time", and every
  scatter subtitle now swap wording based on the selected metric.

Left for a follow-up: the "Detection Rate by Technique" chart still
ranks/colors by green rate only — flipping its bars needs a matching
sort change that's worth reviewing on its own.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant