Skip to content

Fix sampler_weights checkpoint loading#31

Open
Jah-yee wants to merge 1 commit intothinking-machines-lab:mainfrom
Jah-yee:fix-sampler-weights-checkpoint-loading
Open

Fix sampler_weights checkpoint loading#31
Jah-yee wants to merge 1 commit intothinking-machines-lab:mainfrom
Jah-yee:fix-sampler-weights-checkpoint-loading

Conversation

@Jah-yee
Copy link
Copy Markdown

@Jah-yee Jah-yee commented Apr 15, 2026

Summary

Fixes Issue #25: sampler_weights not available while weights is available

Problem

When loading a sampler_weights checkpoint using a tinker path (e.g., tinker://run-id/sampler_weights/0001), the checkpoint loading fails with a "Path is invalid" error. This happens because the API call was being made with an incorrectly formatted checkpoint ID.

Root Cause

In ParsedCheckpointTinkerPath.from_tinker_path(), the checkpoint_id was being set to include the type prefix (e.g., sampler_weights/0001), but the API was receiving inconsistent checkpoint ID formats that didn't properly distinguish between training and sampler checkpoints.

Fix

  1. Added a new api_checkpoint_id property to ParsedCheckpointTinkerPath that returns:

    • For training checkpoints: the pure checkpoint ID (e.g., 0001)
    • For sampler checkpoints: the prefixed ID for API calls (e.g., sampler_weights/0001)
  2. Updated all methods that use tinker paths in rest_client.py to use api_checkpoint_id instead of checkpoint_id

Testing

Added new test file tests/test_sampler_weights_loading.py with tests for:

  • Parsing weights checkpoint paths
  • Parsing sampler_weights checkpoint paths
  • Verifying the correct api_checkpoint_id format is returned for each type

How to Test

  1. Save sampler weights: training_client.save_weights_for_sampler("sampler-001")
  2. Get the returned tinker path (e.g., tinker://run-id/sampler_weights/sampler-001)
  3. Try to download: rest_client.get_checkpoint_archive_url_from_tinker_path(tinker_path)
  4. Verify the download URL is correctly generated

The fix ensures both weights and sampler_weights checkpoints can be loaded without errors.

- Add api_checkpoint_id property to ParsedCheckpointTinkerPath that returns the
  correct checkpoint_id format for API calls (includes 'sampler_weights/'
  prefix for sampler checkpoints)
- Update rest_client methods to use api_checkpoint_id instead of checkpoint_id
- Add tests for sampler_weights path parsing and API checkpoint ID format

Fixes Issue thinking-machines-lab#25: sampler_weights not available while weights is available
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant