Skip to content

Separate launch_id from run_id and consolidate experiment setup#394

Merged
danbraunai-goodfire merged 5 commits intomainfrom
feature/launch-run-id
Feb 18, 2026
Merged

Separate launch_id from run_id and consolidate experiment setup#394
danbraunai-goodfire merged 5 commits intomainfrom
feature/launch-run-id

Conversation

@danbraunai-goodfire
Copy link
Collaborator

@danbraunai-goodfire danbraunai-goodfire commented Feb 17, 2026

Description

Two changes:

  1. Separate launch_id from run_id: Previously conflated under run_id/sweep_id. Now launch_id identifies a group of SLURM jobs (used for git snapshots, WandB views), while run_id identifies an individual WandB run. Each run_id is pre-generated in run.py, making WandB run URLs predictable before jobs start.

  2. Consolidate experiment setup into run_experiment(): The ~30 lines of wandb init / save_pre_run_info / wandb.finish boilerplate duplicated across all four decomposition scripts is now a single run_experiment() wrapper around optimize() in run_spd.py.

Related Issue

Relates to #363. Any PR there should be adapted to use this new interface.

Motivation and Context

  • Couldn't predict WandB URLs at launch time because the run identifier was generated inside each decomposition script
  • Each decomposition script (tms, resid_mlp, lm, ih) had identical setup/teardown code

How Has This Been Tested?

N/A

Does this PR introduce a breaking change?

No

danbraunai-goodfire and others added 2 commits February 17, 2026 11:45
…h_id from run_id

Move duplicated wandb init, pre-run info saving, and wandb finish logic from each
decomposition script into a shared run_experiment() in run_spd.py. Extract
parse_config, parse_sweep_params, and generate_run_id as reusable utilities.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Collaborator

@ocg-goodfire ocg-goodfire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love it, clean

)

snapshot_branch, commit_hash = create_git_snapshot(run_id=run_id)
snapshot_branch, commit_hash = create_git_snapshot(run_id=launch_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this arg be called launch_id?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Changed the arg to snapshot_id, as it can be used with launch_id or run_id (or anything else).

Comment on lines 87 to 88
if len(training_jobs) > 1:
_wandb_setup(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is us only creating a view if there's more than 1 job? if so can me name _wandb_setup something more descriptive? _setup_wandb_views?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to _create_wandb_views_and_report. I don't love it because report is optional, but I also don't love adding "maybe" in there.


Prefixed with 'l-' to prevent Python Fire from parsing the numeric timestamp as an int.
"""
return f"l-{datetime.now().strftime('%Y%m%d_%H%M%S')}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: l is for "launch" here right? I'm about to add more id prefixes ( "clt-XXXX", "molt-XXXX") so I wonder if we should just do the full word

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to just a hardcoded "launch".

Comment on lines +328 to +334
def generate_run_id(run_type: RunType) -> str:
"""Generate a unique run identifier.

Format: `{type_abbr}-{random_hex}`
"""
type_abbr = RUN_TYPE_ABBREVIATIONS[run_type]
return f"{type_abbr}-{secrets.token_hex(4)}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this works perfectly with my comment above lol, nice

@danbraunai-goodfire
Copy link
Collaborator Author

@claude can you review this PR? Note that your review should cover the scope of this PR only. If you spot things unrelated to this PR, feel free to bring them up and we'll consider them for new issues. I want you to think about possible ways that the implementation could be improved. I want critical comments, do feel comfortable to give me the hard truths.

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @danbraunai-goodfire's task —— View job


PR Review: Separate launch_id from run_id and consolidate experiment setup

Review Progress:

  • Read repository guidelines and understand codebase structure
  • Analyze core changes in run.py and run_spd.py
  • Review experiment script consolidation
  • Check ID management and WandB integration
  • Evaluate potential issues and improvements
  • Provide comprehensive feedback

Overall Assessment

This is a well-executed refactoring that clearly separates concerns and eliminates code duplication. The separation of launch_id (SLURM job group identifier) from run_id (individual WandB run identifier) is architecturally sound and the consolidation of boilerplate into run_experiment() is a good DRY improvement.

What's Done Well

1. Clear Conceptual Separation

  • launch_id for SLURM job groups vs run_id for individual WandB runs is conceptually correct
  • Makes WandB URLs predictable at launch time (run.py:128 - URLs are now generated upfront)
  • Follows the codebase's "fail fast" philosophy with proper assertions

2. Excellent Consolidation

  • Moving common setup/teardown into run_experiment() eliminates ~30-60 lines per experiment script
  • Clean interface that maintains flexibility while reducing duplication
  • Good separation: experiment scripts focus on data loading, run_experiment() handles orchestration

3. Robust Implementation

  • Proper distributed training support maintained
  • Git snapshot functionality works with both IDs appropriately
  • WandB integration handles the ID changes cleanly
  • Type safety preserved throughout

🔍 Critical Issues & Improvements

1. Parameter Name Inconsistency (run.py:84)

snapshot_branch, commit_hash = create_git_snapshot(snapshot_id=launch_id)

Issue: The parameter is called snapshot_id but you're passing launch_id. While this works, it's confusing since create_git_snapshot() can accept either launch_id OR run_id depending on context.

Fix: Rename the parameter to be more generic:

# In git_utils.py
def create_git_snapshot(snapshot_id: str) -> tuple[str, str]:
    """Create a git snapshot branch with current changes.
    
    Args:
        snapshot_id: Identifier for the snapshot (can be launch_id, run_id, etc.)
    """

2. Missing ID Validation

Issue: No validation that run_id follows the expected format when passed from external sources.

Recommendation: Add validation in run_experiment():

def run_experiment(..., run_id: str | None = None, ...):
    if run_id is not None:
        # Validate format matches what generate_run_id() produces
        if not re.match(r'^[a-z]+-[a-f0-9]{8}$', run_id):
            raise ValueError(f"Invalid run_id format: {run_id}")

3. Potential Race Condition in DDP Setup

Issue: In run_experiment(), the run_id generation happens only on the main process (run.py:410-411). If other ranks need this ID before synchronization, there could be issues.

Current code:

if is_main_process():
    run_id = run_id or generate_run_id("spd")  # Other ranks don't have this

Recommendation: Ensure all ranks have the same run_id:

if is_main_process():
    run_id = run_id or generate_run_id("spd")
else:
    assert run_id is not None, "run_id must be provided to non-main processes"

4. WandB URL Generation Timing

Issue: While you generate URLs upfront in run.py:128, there's no guarantee the WandB runs will actually be created with those exact IDs if something fails during experiment setup.

Recommendation: Consider adding a note or warning about this limitation, or defer URL generation until after successful WandB init.

5. Function Signature Complexity

run_experiment() now has 8 parameters, several optional. While not terrible, it's getting unwieldy.

Future consideration: You might want to group related parameters into a dataclass:

@dataclass
class ExperimentContext:
    run_id: str | None = None
    launch_id: str | None = None
    evals_id: str | None = None
    experiment_tag: str
    sweep_params: dict[str, Any] | None = None

6. Inconsistent Error Handling

Issue: Some functions use exceptions, others use assertions. The git snapshot function gracefully handles push failures but other parts are more brittle.

Example: run.py:76 has assert n_agents is not None but could provide a more user-friendly error message.

🔧 Minor Improvements

1. Magic Number (git_utils.py:104)

The --no-verify flag is used without explanation. Add a comment explaining why pre-commit hooks are skipped for snapshots.

2. Redundant Type Annotations (compute_utils.py:31)

run_id: str  # Pre-generated unique run identifier (e.g. "s-a1b2c3d4")

This comment is helpful and should be kept, but consider if the dataclass field could benefit from more structured validation.

3. String Interpolation Style

Mixed usage of f-strings and .format(). Pick one for consistency (f-strings preferred per modern Python).

📋 Testing Considerations

  • Missing tests for the ID separation logic - consider adding unit tests that verify launch_id vs run_id are used appropriately
  • Integration tests for the consolidated run_experiment() function would be valuable
  • WandB URL generation should be tested with various ID formats

🎯 Breaking Changes Assessment

You claim "No breaking changes to the CLI interface" but:

  • Internal function signatures DID change (sweep_idlaunch_id)
  • If anyone was directly importing and using these internal functions, they'd break
  • However, since these are internal APIs and CLI remains unchanged, this is acceptable for a research codebase

🚀 Overall Verdict

This is excellent refactoring work that improves the codebase's maintainability and conceptual clarity. The ID separation solves a real problem (unpredictable WandB URLs) and the consolidation reduces future maintenance burden.

Approval recommendation: ✅ APPROVE with the minor fixes suggested above.

The identified issues are relatively minor and don't block the PR. This refactoring provides clear value and the implementation is solid. The hardest part of this type of refactoring - ensuring distributed training, WandB integration, and SLURM orchestration all still work correctly - appears to have been handled well.


@danbraunai-goodfire danbraunai-goodfire merged commit 39de8fc into main Feb 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants