-
Notifications
You must be signed in to change notification settings - Fork 452
A few fixes to the MFU/MBU code #1108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c2b547f to
a68ba0d
Compare
Added back all docstrings and inline comments that were removed during the sliding window implementation. These comments explain the assumptions, calculations, and design decisions in the FLOP and memory bandwidth estimation code. Changes: - Restored docstrings for all ModelDims methods (attn_flops, mlp_flops, prefill_flops, decode_flops, flops, weight_memory_bytes, kv_cache_write_bytes, kv_cache_read_bytes, prefill_memory_bytes, decode_memory_bytes, memory_bytes) - Restored inline comments explaining calculation details - Kept all functionality changes (sliding window support, A100 bandwidth fix) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Changed attn_flops signature from using a boolean use_sliding_window flag
to accepting the sliding_window value directly as an Optional[int]. This
makes the API cleaner and more explicit.
Changes:
- attn_flops now takes sliding_window: Optional[int] = None instead of
use_sliding_window: bool = False
- Uses kv_len = min(kv_len, sliding_window or float("inf")) to handle
None case elegantly
- Updated all call sites in prefill_flops and decode_flops to pass
sliding_window=None for full attention layers and
sliding_window=self.sliding_window for sliding window layers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
| prefill_flops = model_dims.flops([sequence_length], None) | ||
| decode_flops = total_flops - prefill_flops | ||
| decode_flops_in_gflops = decode_flops / 1e9 | ||
| self.assertAlmostEqual(decode_flops_in_gflops, 27.92, delta=0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from some sanity checking that I did manually for the olmo3 paper.
| total_bytes *= 2 | ||
|
|
||
| memory_in_gb = total_bytes / 1e9 | ||
| self.assertAlmostEqual(memory_in_gb, 3.926, delta=0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from some sanity checking that I did manually for the olmo3 paper.
Previously, we would occasionally get >100%.
Fixes #1098. Also adds logging so we can reproduce calculations if there are future issues with MFU/MBU calculations.
Fixes were:
Note
Refactors and fixes MFU/MBU computations to account for sliding-window attention and multi-engine setups, updates GPU specs, integrates new APIs in training/benchmark paths, and adds targeted tests with reproduction cases.
ModelDims.calculate_mfu/mbu,calculate_actor_utilization,calculate_learner_utilizationand use them ingrpo_fast.pyandbenchmark_generators.py.memory_bytes(prompt_lengths, num_engines, num_gpus_per_engine, ...)to normalize MBU across parallel engines.check_calculationwarning with repro JSON when MFU/MBU > 100%.ModelDims.from_vllm_configpulls heads directly fromhf_text_configand captures sliding-window layer counts.2.0e12B/s.ModelDimsAPIs; derivenum_engines/num_gpus_per_enginefor correct normalization.open_instruct/test_data/mbu_reproduction_cases.json.test_utils.pyfor FLOPs/memory, MFU/MBU (incl. multi-engine) andfrom_vllm_configparity.Written by Cursor Bugbot for commit b839f17. This will update automatically on new commits. Configure here.