Skip to content

Transcoder harvest integration#419

Open
ocg-goodfire wants to merge 14 commits intodevfrom
feature/transcoder-harvest-integration
Open

Transcoder harvest integration#419
ocg-goodfire wants to merge 14 commits intodevfrom
feature/transcoder-harvest-integration

Conversation

@ocg-goodfire
Copy link
Collaborator

@ocg-goodfire ocg-goodfire commented Mar 3, 2026

Description

Ports @bartbussmann 's PR bartbussmann#7 to this repo with a few small changes to make it seamlessly integrate with the harvest CLI

bartbussmann and others added 14 commits March 3, 2026 10:58
Extends the generic harvest pipeline (from #398) to support transcoders
from nn_decompositions. Adds TranscoderAdapter, TranscoderHarvestFn, and
TranscoderHarvestConfig so that trained transcoders (loaded from wandb
artifacts) can be harvested for activation statistics using the same
pipeline as SPD.

Includes an example script demonstrating end-to-end harvesting of
BatchTopK k=32 transcoders across all 4 LlamaSimpleMLP layers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These were incorrectly hardcoded as "gpt2" and
"danbraunai/pile-uncopyrighted-tok" in the adapter. The transcoders
are actually trained with the EleutherAI/gpt-neox-20b tokenizer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of requiring tokenizer_name and dataset_name in the harvest
config, extract them from the base model's PretrainRunInfo. The base
model's wandb run already stores the full training config including
hf_tokenizer_path and train_dataset_config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use the base model's train_dataset_config directly instead of
hardcoding dataset fields. Only override streaming=True (for harvest)
and n_ctx=block_size (strip the extra label token).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add adapter_from_config() that takes the full method_config, so
  TranscoderAdapter can be constructed in the harvest worker
- Keep adapter_from_id() for downstream consumers (autointerp, intruder)
  that only have a decomposition ID
- Replace Python example script with YAML config for spd-harvest
- Exclude transcoder files from basedpyright (optional nn_decompositions dep)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copies EncoderConfig and SharedTranscoder + subclasses (474 lines) from
bartbussmann/nn_decompositions (MIT) into spd/adapters/, eliminating the
optional dependency. Only torch + stdlib needed, both already deps.

- spd/adapters/encoder_config.py: EncoderConfig dataclass
- spd/adapters/transcoders.py: SharedTranscoder, Vanilla/TopK/BatchTopK/JumpReLU
- Remove nn_decompositions optional dep from pyproject.toml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Split encode() into encode() and encode_dense() to avoid union return type
- Add type annotations to autograd.Function forward/backward methods
- Type _build_loss_dict return as dict[str, Any]
- Assert std is not None in postprocess_output, .grad in weight norm
- Use int() for dead_features.sum() passed to min()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use *grad_outputs signature for autograd.Function.backward
- Replace @torch.no_grad() decorator with context manager
- Credit Bart Bussmann by name in vendored file docstrings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For non-SPD decomposition IDs (e.g. tc-*), recover the full method
config from the harvest DB. This means spd-autointerp, intruder eval,
graph-interp, and label scoring all work with transcoders — no config
passing needed, just the decomposition ID.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants