Skip to content

chore(benchmark): archive snapshot + purge x-ai/grok-4.1-fast raw data#230

Merged
ttlequals0 merged 1 commit into
mainfrom
chore/archive-and-purge-grok-4.1-fast
May 16, 2026
Merged

chore(benchmark): archive snapshot + purge x-ai/grok-4.1-fast raw data#230
ttlequals0 merged 1 commit into
mainfrom
chore/archive-and-purge-grok-4.1-fast

Conversation

@ttlequals0
Copy link
Copy Markdown
Owner

Summary

xAI deprecated grok-4.1-fast upstream (every call returns 404 with the "switch to Grok 4.3" advisory) and grok-4.3 is the live xAI entry. Keeping 780 dead grok-4.1-fast rows in calls.jsonl was bloating the prompt/response tree by 903 files without adding any benchmark signal.

Before purging, this PR snapshots the report with grok-4.1-fast shown as an active model so the comparative numbers stay on disk (F1 0.642 at $0.1509/ep, rank 2 overall by F1, rank 1 by F1-per-dollar):

results/archive/2026-05-16/
  report.md
  report_assets/   (full 32-model chart set)

Then purges:

  • 780 grok-4.1-fast rows removed from calls.jsonl (20,492 -> 19,712)
  • 780 corresponding response files deleted
  • 123 prompt files (unique to grok-4.1-fast hashes) deleted
  • live results/report.md regenerated -- no grok-4.1-fast anywhere, and the Deprecated Models footnote drops out since there is no aggregate data left to render

The benchmark.toml entry stays as deprecated = true so the runner skips it on any future sweep (no new data sneaks back in). A local backup of the pre-purge calls.jsonl sits at results/raw/calls.jsonl.bak-pre-grok-purge and is intentionally not committed.

Benchmark-tooling / data only (benchmarks/ is dockerignored). No version.py bump, no CHANGELOG entry.

Test plan

  • CI lint + tests pass
  • grep -c "grok-4.1-fast" benchmarks/llm/results/report.md returns 0
  • grep -c "grok-4.1-fast" benchmarks/llm/results/archive/2026-05-16/report.md returns 26 (historical snapshot intact)
  • wc -l benchmarks/llm/results/raw/calls.jsonl returns 19712
  • Live report.md no longer has a ### Deprecated Models section
  • Archive report.md shows the full Deprecated-Models-free pre-purge picture (grok-4.1-fast as active model)

…st raw data

xAI deprecated grok-4.1-fast upstream (every call returns 404 with the
"switch to Grok 4.3" advisory) and grok-4.3 is now the live entry. Keeping
780 dead grok-4.1-fast rows in calls.jsonl was bloating the prompt/response
tree by 903 files without adding any benchmark signal.

Before deleting, snapshot the report with grok-4.1-fast still shown as an
active model so we keep the comparative numbers (F1 0.642 at $0.1509/ep,
rank 2 overall, rank 1 F1/$) on disk:

  results/archive/2026-05-16/
    report.md
    report_assets/  (full 32-model chart set)

Then purge:

  - 780 grok-4.1-fast rows removed from calls.jsonl (20,492 -> 19,712)
  - 780 corresponding response files deleted
  - 123 prompt files (unique to grok-4.1-fast hashes) deleted
  - live results/report.md regenerated -- no grok-4.1-fast anywhere,
    and the Deprecated Models footnote drops out since there is no
    aggregate data left to render

The benchmark.toml entry stays as deprecated=true so the runner skips it
on any future sweep (no new data sneaks back in). A local backup of the
pre-purge calls.jsonl sits at results/raw/calls.jsonl.bak-pre-grok-purge
and is intentionally not committed.

No production runtime impact, no version bump, no CHANGELOG.
@ttlequals0 ttlequals0 merged commit 53ac483 into main May 16, 2026
9 checks passed
@ttlequals0 ttlequals0 deleted the chore/archive-and-purge-grok-4.1-fast branch May 16, 2026 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant