Skip to content

Verify Cezar v3 (1.1893, #2)#90

Closed
willpartcl wants to merge 1 commit into
mainfrom
verify-cezar-v3
Closed

Verify Cezar v3 (1.1893, #2)#90
willpartcl wants to merge 1 commit into
mainfrom
verify-cezar-v3

Conversation

@willpartcl
Copy link
Copy Markdown
Contributor

Verified avg proxy 1.1893 across all 17 IBM benchmarks, 0 overlaps, ~15.5h total runtime.

Self-reported 1.037 — verified ~14.7% worse than self-reported, but ~2.7% better than previous Cezar verification (1.2224). Takes #2 above KLA MACH (1.2121).

Required two infra adjustments for the verification:

  • Added build-essential to the eval Dockerfile so torch.compile (Inductor backend) can shell out to gcc to build cached kernels at runtime.
  • Ran with TORCH_COMPILE_DISABLE=1 and TORCHDYNAMO_SUPPRESS_ERRORS=1 to bypass a Triton "invalid device context" error during autotuning. (Cezar's code has compile fallback at compile time but not at runtime.)

🤖 Generated with Claude Code

Verified avg proxy 1.1893 across all 17 IBM benchmarks, 0 overlaps,
~15.5h total runtime. Self-reported 1.037 — verified ~14.7% worse than
self-reported, but ~2.7% better than previous Cezar verification (1.2224).

Best ibm09=0.9041, worst ibm18=1.4379. Takes #2 above KLA MACH (1.2121).

Required two infra changes for the verification:
- Added `build-essential` to the eval Dockerfile so torch.compile (Inductor
  backend) can shell out to gcc to build cached kernels.
- Ran with `TORCH_COMPILE_DISABLE=1` and `TORCHDYNAMO_SUPPRESS_ERRORS=1`
  to bypass a Triton "invalid device context" error during autotuning.
  Cezar's code has compile fallback at compile time but not at runtime.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@willpartcl
Copy link
Copy Markdown
Contributor Author

Subsumed by #91 which includes the Cezar v3 verified score + 9 new entries + 4 resubmit value updates.

@willpartcl willpartcl closed this May 11, 2026
Mummanajagadeesh pushed a commit to Mummanajagadeesh/macro-place-challenge-2026 that referenced this pull request May 14, 2026
Cezar v3 verification (subsumes PR partcleda#90): verified 1.1893 (self-reported
1.037). Sits at partcleda#6.

Resubmit value updates:
- Archgen: 1.3479 → 1.16511 (AutoDMP + GPU, 5/9)
- ArzunPD: 1.2478 → 1.1883 (Hyperplace, 5/8)
- Binghamton: pending → 1.7621 (feng shui placement, 5/10)
- No Man's Sky: note updated (resubmitted 5/6)

New entries:
- DREAMPlaceProMaxUltra (1.0467, NTHU-NTUST team) — lands partcleda#2
- Vibe (1.1477) — partcleda#3
- Talyxion (1.2075) — partcleda#7
- Adam_A (1.2655) — partcleda#11
- jrslbenn / SPIRAL (1.353) — partcleda#18
- Barsat Khadka / fmpa (1.38) — partcleda#19
- Aegir (1.4553) — partcleda#27
- ZeroLatency (1.5286) — partcleda#35
- rpocevi (1.8894) — partcleda#42

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant