Separate launch_id from run_id and consolidate experiment setup by danbraunai-goodfire · Pull Request #394 · goodfire-ai/spd

danbraunai-goodfire · 2026-02-17T14:00:20Z

Description

Two changes:

Separate launch_id from run_id: Previously conflated under run_id/sweep_id. Now launch_id identifies a group of SLURM jobs (used for git snapshots, WandB views), while run_id identifies an individual WandB run. Each run_id is pre-generated in run.py, making WandB run URLs predictable before jobs start.
Consolidate experiment setup into run_experiment(): The ~30 lines of wandb init / save_pre_run_info / wandb.finish boilerplate duplicated across all four decomposition scripts is now a single run_experiment() wrapper around optimize() in run_spd.py.

Related Issue

Relates to #363. Any PR there should be adapted to use this new interface.

Motivation and Context

Couldn't predict WandB URLs at launch time because the run identifier was generated inside each decomposition script
Each decomposition script (tms, resid_mlp, lm, ih) had identical setup/teardown code

How Has This Been Tested?

N/A

Does this PR introduce a breaking change?

No

…h_id from run_id Move duplicated wandb init, pre-run info saving, and wandb finish logic from each decomposition script into a shared run_experiment() in run_spd.py. Extract parse_config, parse_sweep_params, and generate_run_id as reusable utilities. Co-Authored-By: Claude Opus 4.6 <[email protected]>

ocg-goodfire

love it, clean

ocg-goodfire · 2026-02-17T22:38:48Z

spd/scripts/run.py

    )

-    snapshot_branch, commit_hash = create_git_snapshot(run_id=run_id)
+    snapshot_branch, commit_hash = create_git_snapshot(run_id=launch_id)


should this arg be called launch_id?

Thanks. Changed the arg to snapshot_id, as it can be used with launch_id or run_id (or anything else).

ocg-goodfire · 2026-02-17T22:40:06Z

spd/scripts/run.py

+    if len(training_jobs) > 1:
+        _wandb_setup(


this is us only creating a view if there's more than 1 job? if so can me name _wandb_setup something more descriptive? _setup_wandb_views?

Changed to _create_wandb_views_and_report. I don't love it because report is optional, but I also don't love adding "maybe" in there.

ocg-goodfire · 2026-02-17T22:43:37Z

spd/scripts/run.py

+
+    Prefixed with 'l-' to prevent Python Fire from parsing the numeric timestamp as an int.
+    """
+    return f"l-{datetime.now().strftime('%Y%m%d_%H%M%S')}"


Tiny nit: l is for "launch" here right? I'm about to add more id prefixes ( "clt-XXXX", "molt-XXXX") so I wonder if we should just do the full word

Changed to just a hardcoded "launch".

ocg-goodfire · 2026-02-17T22:44:47Z

spd/utils/run_utils.py

+def generate_run_id(run_type: RunType) -> str:
+    """Generate a unique run identifier.
+
+    Format: `{type_abbr}-{random_hex}`
+    """
+    type_abbr = RUN_TYPE_ABBREVIATIONS[run_type]
+    return f"{type_abbr}-{secrets.token_hex(4)}"


oh this works perfectly with my comment above lol, nice

danbraunai-goodfire · 2026-02-18T11:39:24Z

@claude can you review this PR? Note that your review should cover the scope of this PR only. If you spot things unrelated to this PR, feel free to bring them up and we'll consider them for new issues. I want you to think about possible ways that the implementation could be improved. I want critical comments, do feel comfortable to give me the hard truths.

claude · 2026-02-18T11:39:41Z

Claude finished @danbraunai-goodfire's task —— View job

PR Review: Separate launch_id from run_id and consolidate experiment setup

Review Progress:

Read repository guidelines and understand codebase structure
Analyze core changes in run.py and run_spd.py
Review experiment script consolidation
Check ID management and WandB integration
Evaluate potential issues and improvements
Provide comprehensive feedback

Overall Assessment

This is a well-executed refactoring that clearly separates concerns and eliminates code duplication. The separation of launch_id (SLURM job group identifier) from run_id (individual WandB run identifier) is architecturally sound and the consolidation of boilerplate into run_experiment() is a good DRY improvement.

✅ What's Done Well

1. Clear Conceptual Separation

launch_id for SLURM job groups vs run_id for individual WandB runs is conceptually correct
Makes WandB URLs predictable at launch time (run.py:128 - URLs are now generated upfront)
Follows the codebase's "fail fast" philosophy with proper assertions

2. Excellent Consolidation

Moving common setup/teardown into run_experiment() eliminates ~30-60 lines per experiment script
Clean interface that maintains flexibility while reducing duplication
Good separation: experiment scripts focus on data loading, run_experiment() handles orchestration

3. Robust Implementation

Proper distributed training support maintained
Git snapshot functionality works with both IDs appropriately
WandB integration handles the ID changes cleanly
Type safety preserved throughout

🔍 Critical Issues & Improvements

1. Parameter Name Inconsistency (`run.py:84`)

snapshot_branch, commit_hash = create_git_snapshot(snapshot_id=launch_id)

Issue: The parameter is called snapshot_id but you're passing launch_id. While this works, it's confusing since create_git_snapshot() can accept either launch_id OR run_id depending on context.

Fix: Rename the parameter to be more generic:

# In git_utils.py
def create_git_snapshot(snapshot_id: str) -> tuple[str, str]:
    """Create a git snapshot branch with current changes.
    
    Args:
        snapshot_id: Identifier for the snapshot (can be launch_id, run_id, etc.)
    """

2. Missing ID Validation

Issue: No validation that run_id follows the expected format when passed from external sources.

Recommendation: Add validation in run_experiment():

def run_experiment(..., run_id: str | None = None, ...):
    if run_id is not None:
        # Validate format matches what generate_run_id() produces
        if not re.match(r'^[a-z]+-[a-f0-9]{8}$', run_id):
            raise ValueError(f"Invalid run_id format: {run_id}")

3. Potential Race Condition in DDP Setup

Issue: In run_experiment(), the run_id generation happens only on the main process (run.py:410-411). If other ranks need this ID before synchronization, there could be issues.

Current code:

if is_main_process():
    run_id = run_id or generate_run_id("spd")  # Other ranks don't have this

Recommendation: Ensure all ranks have the same run_id:

if is_main_process():
    run_id = run_id or generate_run_id("spd")
else:
    assert run_id is not None, "run_id must be provided to non-main processes"

4. WandB URL Generation Timing

Issue: While you generate URLs upfront in run.py:128, there's no guarantee the WandB runs will actually be created with those exact IDs if something fails during experiment setup.

Recommendation: Consider adding a note or warning about this limitation, or defer URL generation until after successful WandB init.

5. Function Signature Complexity

run_experiment() now has 8 parameters, several optional. While not terrible, it's getting unwieldy.

Future consideration: You might want to group related parameters into a dataclass:

@dataclass
class ExperimentContext:
    run_id: str | None = None
    launch_id: str | None = None
    evals_id: str | None = None
    experiment_tag: str
    sweep_params: dict[str, Any] | None = None

6. Inconsistent Error Handling

Issue: Some functions use exceptions, others use assertions. The git snapshot function gracefully handles push failures but other parts are more brittle.

Example: run.py:76 has assert n_agents is not None but could provide a more user-friendly error message.

🔧 Minor Improvements

1. Magic Number (`git_utils.py:104`)

The --no-verify flag is used without explanation. Add a comment explaining why pre-commit hooks are skipped for snapshots.

2. Redundant Type Annotations (`compute_utils.py:31`)

run_id: str  # Pre-generated unique run identifier (e.g. "s-a1b2c3d4")

This comment is helpful and should be kept, but consider if the dataclass field could benefit from more structured validation.

3. String Interpolation Style

Mixed usage of f-strings and .format(). Pick one for consistency (f-strings preferred per modern Python).

📋 Testing Considerations

Missing tests for the ID separation logic - consider adding unit tests that verify launch_id vs run_id are used appropriately
Integration tests for the consolidated run_experiment() function would be valuable
WandB URL generation should be tested with various ID formats

🎯 Breaking Changes Assessment

You claim "No breaking changes to the CLI interface" but:

Internal function signatures DID change (sweep_id → launch_id)
If anyone was directly importing and using these internal functions, they'd break
However, since these are internal APIs and CLI remains unchanged, this is acceptable for a research codebase

🚀 Overall Verdict

This is excellent refactoring work that improves the codebase's maintainability and conceptual clarity. The ID separation solves a real problem (unpredictable WandB URLs) and the consolidation reduces future maintenance burden.

Approval recommendation: ✅ APPROVE with the minor fixes suggested above.

The identified issues are relatively minor and don't block the PR. This refactoring provides clear value and the implementation is solid. The hardest part of this type of refactoring - ensuring distributed training, WandB integration, and SLURM orchestration all still work correctly - appears to have been handled well.

danbraunai-goodfire and others added 2 commits February 17, 2026 11:45

Create run_id in run.py

81a1976

ocg-goodfire approved these changes Feb 17, 2026

View reviewed changes

Address PR comments

530eaf7

danbraunai-goodfire added 2 commits February 18, 2026 11:41

Update test

b7d26ae

Merge branch 'main' into feature/launch-run-id

6dea329

danbraunai-goodfire merged commit 39de8fc into main Feb 18, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate launch_id from run_id and consolidate experiment setup#394

Separate launch_id from run_id and consolidate experiment setup#394
danbraunai-goodfire merged 5 commits intomainfrom
feature/launch-run-id

danbraunai-goodfire commented Feb 17, 2026 •

edited

Loading

Uh oh!

ocg-goodfire left a comment

Uh oh!

ocg-goodfire Feb 17, 2026

Uh oh!

danbraunai-goodfire Feb 18, 2026

Uh oh!

ocg-goodfire Feb 17, 2026

Uh oh!

danbraunai-goodfire Feb 18, 2026

Uh oh!

ocg-goodfire Feb 17, 2026

Uh oh!

danbraunai-goodfire Feb 18, 2026

Uh oh!

ocg-goodfire Feb 17, 2026

Uh oh!

danbraunai-goodfire commented Feb 18, 2026

Uh oh!

claude bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danbraunai-goodfire commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

Uh oh!

ocg-goodfire left a comment

Choose a reason for hiding this comment

Uh oh!

ocg-goodfire Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

danbraunai-goodfire Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ocg-goodfire Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

danbraunai-goodfire Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ocg-goodfire Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

danbraunai-goodfire Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

ocg-goodfire Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

danbraunai-goodfire commented Feb 18, 2026

Uh oh!

claude bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Separate launch_id from run_id and consolidate experiment setup

Overall Assessment

✅ What's Done Well

1. Clear Conceptual Separation

2. Excellent Consolidation

3. Robust Implementation

🔍 Critical Issues & Improvements

1. Parameter Name Inconsistency (run.py:84)

2. Missing ID Validation

3. Potential Race Condition in DDP Setup

4. WandB URL Generation Timing

5. Function Signature Complexity

6. Inconsistent Error Handling

🔧 Minor Improvements

1. Magic Number (git_utils.py:104)

2. Redundant Type Annotations (compute_utils.py:31)

3. String Interpolation Style

📋 Testing Considerations

🎯 Breaking Changes Assessment

🚀 Overall Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danbraunai-goodfire commented Feb 17, 2026 •

edited

Loading

claude bot commented Feb 18, 2026 •

edited

Loading

1. Parameter Name Inconsistency (`run.py:84`)

1. Magic Number (`git_utils.py:104`)

2. Redundant Type Annotations (`compute_utils.py:31`)