Separate launch_id from run_id and consolidate experiment setup#394
Separate launch_id from run_id and consolidate experiment setup#394danbraunai-goodfire merged 5 commits intomainfrom
Conversation
…h_id from run_id Move duplicated wandb init, pre-run info saving, and wandb finish logic from each decomposition script into a shared run_experiment() in run_spd.py. Extract parse_config, parse_sweep_params, and generate_run_id as reusable utilities. Co-Authored-By: Claude Opus 4.6 <[email protected]>
spd/scripts/run.py
Outdated
| ) | ||
|
|
||
| snapshot_branch, commit_hash = create_git_snapshot(run_id=run_id) | ||
| snapshot_branch, commit_hash = create_git_snapshot(run_id=launch_id) |
There was a problem hiding this comment.
should this arg be called launch_id?
There was a problem hiding this comment.
Thanks. Changed the arg to snapshot_id, as it can be used with launch_id or run_id (or anything else).
spd/scripts/run.py
Outdated
| if len(training_jobs) > 1: | ||
| _wandb_setup( |
There was a problem hiding this comment.
this is us only creating a view if there's more than 1 job? if so can me name _wandb_setup something more descriptive? _setup_wandb_views?
There was a problem hiding this comment.
Changed to _create_wandb_views_and_report. I don't love it because report is optional, but I also don't love adding "maybe" in there.
spd/scripts/run.py
Outdated
|
|
||
| Prefixed with 'l-' to prevent Python Fire from parsing the numeric timestamp as an int. | ||
| """ | ||
| return f"l-{datetime.now().strftime('%Y%m%d_%H%M%S')}" |
There was a problem hiding this comment.
Tiny nit: l is for "launch" here right? I'm about to add more id prefixes ( "clt-XXXX", "molt-XXXX") so I wonder if we should just do the full word
There was a problem hiding this comment.
Changed to just a hardcoded "launch".
| def generate_run_id(run_type: RunType) -> str: | ||
| """Generate a unique run identifier. | ||
|
|
||
| Format: `{type_abbr}-{random_hex}` | ||
| """ | ||
| type_abbr = RUN_TYPE_ABBREVIATIONS[run_type] | ||
| return f"{type_abbr}-{secrets.token_hex(4)}" |
There was a problem hiding this comment.
oh this works perfectly with my comment above lol, nice
|
@claude can you review this PR? Note that your review should cover the scope of this PR only. If you spot things unrelated to this PR, feel free to bring them up and we'll consider them for new issues. I want you to think about possible ways that the implementation could be improved. I want critical comments, do feel comfortable to give me the hard truths. |
|
Claude finished @danbraunai-goodfire's task —— View job PR Review: Separate launch_id from run_id and consolidate experiment setupReview Progress:
Overall AssessmentThis is a well-executed refactoring that clearly separates concerns and eliminates code duplication. The separation of ✅ What's Done Well1. Clear Conceptual Separation
2. Excellent Consolidation
3. Robust Implementation
🔍 Critical Issues & Improvements1. Parameter Name Inconsistency (
|
Description
Two changes:
Separate
launch_idfromrun_id: Previously conflated underrun_id/sweep_id. Nowlaunch_ididentifies a group of SLURM jobs (used for git snapshots, WandB views), whilerun_ididentifies an individual WandB run. Eachrun_idis pre-generated inrun.py, making WandB run URLs predictable before jobs start.Consolidate experiment setup into
run_experiment(): The ~30 lines of wandb init /save_pre_run_info/wandb.finishboilerplate duplicated across all four decomposition scripts is now a singlerun_experiment()wrapper aroundoptimize()inrun_spd.py.Related Issue
Relates to #363. Any PR there should be adapted to use this new interface.
Motivation and Context
How Has This Been Tested?
N/A
Does this PR introduce a breaking change?
No