Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1,897 changes: 1,897 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report.md

Large diffs are not rendered by default.

2,767 changes: 2,767 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/agreement.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,663 changes: 4,663 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/alignment.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,911 changes: 3,911 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/boundary.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,401 changes: 5,401 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/calibration.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,445 changes: 3,445 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/compliance.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,578 changes: 4,578 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/detection_by_length.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5,632 changes: 5,632 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/episodes.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,053 changes: 4,053 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/latency_tail.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,919 changes: 3,919 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/pareto.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,082 changes: 4,082 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/parser_stress.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,685 changes: 4,685 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/precision_recall.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4,518 changes: 4,518 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/token_efficiency.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,358 changes: 3,358 additions & 0 deletions benchmarks/llm/results/archive/2026-05-16/report_assets/trial_variance.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
780 changes: 0 additions & 780 deletions benchmarks/llm/results/raw/calls.jsonl

Large diffs are not rendered by default.

This file was deleted.

Loading
Loading