Skip to content

(dev -> main) App updates and backwards transformer ci functions#355

Draft
danbraunai-goodfire wants to merge 519 commits intomainfrom
dev
Draft

(dev -> main) App updates and backwards transformer ci functions#355
danbraunai-goodfire wants to merge 519 commits intomainfrom
dev

Conversation

@danbraunai-goodfire
Copy link
Collaborator

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Does this PR introduce a breaking change?

claude-spd1 and others added 22 commits February 2, 2026 12:43
- Rename all classes/functions from ContinuousPGD to PersistentPGD
- Rename config files from *-cpgd* to *-ppgd*
- Simplify mask shape from (batch, C) to (C,) per module
- Single mask shared across all batch elements and ranks
- Use fixed seed (PPGD_INIT_SEED=42) for consistent initialization
- Keep all_reduce on gradients so updates stay synchronized

Co-Authored-By: Claude Opus 4.5 <[email protected]>
This feature tracked which components were considered "alive" based on
their firing frequency over time, but wasn't providing value. Old configs
with n_examples_until_dead will still load (the key is now ignored).

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add a subset variant of persistent PGD loss that routes to random
subsets of modules per position, analogous to how StochasticReconSubsetLoss
relates to StochasticReconLoss.

- Add PersistentPGDReconSubsetLossConfig with routing field
- Make router a required parameter in persistent_pgd_recon_loss
- Update run_spd.py to pass appropriate router based on config type
- Update tests to pass router parameter

Co-Authored-By: Claude Opus 4.5 <[email protected]>
…ments

- Add transition_hidden_dim config for global_reverse_residual CI function
- Move CI reading after transition for improved architecture
- Add pile_gpt2_simple-2L_global_reverse experiment
- Update ss_llama_simple_mlp-2L-wide expected runtime

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Add PersistentPGDMaskScope type with single_mask, broadcast_across_batch,
  unique_per_batch_per_token options
- Change PersistentPGDState to take batch_dims and cfg instead of optimizer_cfg
- Add get_grads() method to PersistentPGDState for computing gradients
- Refactor compute_losses to return dict of losses instead of total
- Update persistent_pgd_recon_loss to be called from compute_losses
- Update tests to use new API (skip 2 tests pending LM-style test model)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@danbraunai-goodfire
Copy link
Collaborator Author

@claude can you review this PR? Note that your review should cover the scope of this PR only. If you spot things unrelated to this PR, feel free to bring them up and we'll consider them for new issues. I want you to think about possible ways that the implementation could be improved. Including ways it could be re-written completely using different abstractions. I want critical comments, do feel comfortable to give me the hard truths.

@claude
Copy link
Contributor

claude bot commented Feb 4, 2026

Claude finished @danbraunai-goodfire's task —— View job


PR Review: App updates and backwards transformer ci functions

Review Scope: 288 commits, 100 files changed, focusing on app updates and transformer CI functions

Todo List

  • Activate virtual environment and setup
  • Analyze major architectural changes in the app
  • Review new abstractions and components
  • Examine backend API changes
  • Review frontend component refactoring
  • Analyze config changes across experiments
  • Provide critical feedback on implementation approach
  • Suggest alternative architectures and improvements

Critical Review & Hard Truths

🔥 Major Architectural Problems

1. Backend Compute Layer is a God Object (spd/app/backend/compute.py:1-827)

  • Problem: 827 lines doing gradient computation, edge calculation, intervention logic, AND result formatting
  • Anti-pattern: Multiple responsibilities violating SRP
  • Rewrite suggestion:
    # Split into focused modules:
    spd/app/backend/compute/
      ├── gradient_engine.py     # Core autograd operations
      ├── attribution_builder.py # Edge construction logic  
      ├── intervention_runner.py # Forward passes with masks
      └── result_formatters.py   # Output transformation

2. Database Schema Fighting the Type System (spd/app/backend/database.py)

  • Problem: StoredGraph uses discriminated unions in Python, but database still has nullable columns for all possible fields
  • Result: Invalid states are representable at DB level despite type safety in Python
  • Solution: Either use proper table inheritance OR embrace the union and validate constraints in application code, don't half-ass both approaches

3. Frontend State Management is Chaotic (PromptAttributionsTab.svelte:565+ lines)

  • Problem: No state machine - implicit transitions everywhere
  • Examples:
    • TabViewState + GraphComputeState + ComposerState can be in invalid combinations
    • UI can be in "loading" state while also in "draft" mode
  • Rewrite suggestion: Use XState or explicit state machine pattern. Current approach doesn't scale.

🚨 Code Quality Issues

4. Layer Aliasing Global State (spd/app/frontend/src/lib/layerAliasing.ts:57)

let cachedArchitecture: Architecture | null = null;
  • Problem: Global mutable cache that can become stale, not reset between contexts
  • Solution: Pass architecture context down explicitly or use a proper store

5. Bulk API Endpoint Design (spd/app/backend/routers/component_data.py)

  • Good: Eliminated GIL contention by batching requests
  • Bad: Still sequential execution inside the handler - missed opportunity for true parallelism
  • Alternative: Use asyncio.gather() to run the three data fetches concurrently

6. Configuration Explosion (spd/configs.py)

  • Added complex global_reverse_residual CI config with 7+ required fields and complex validation
  • Problem: Config complexity is growing faster than the value it provides
  • Question: Why not use dataclasses with defaults instead of elaborate Pydantic validation?

💡 Missed Opportunities

7. TypeScript Discriminated Unions Done Halfway

// Current: Can still represent invalid states
type OptimizeConfigDraft = {
    loss: LossConfigDraft;  // Could be invalid
    // ...
}

// Better: Make invalid states unrepresentable  
type OptimizeState = 
  | { status: "draft"; config: Partial<OptimizeConfig> }
  | { status: "valid"; config: ValidOptimizeConfig }
  | { status: "computing"; config: ValidOptimizeConfig }

8. Database Caching Strategy

  • Current: Ad-hoc SQLite caching with complex uniqueness constraints
  • Alternative: Consider Redis/in-memory cache for computed graphs + persistent storage for configuration only
  • Benefit: Simpler schema, better performance for large graphs

🎯 Specific Technical Issues

9. Performance Anti-patterns:

  • compute_prompt_attributions() does full gradient computation even for display-only graphs
  • useComponentData and useComponentDataExpectCached are nearly identical - code duplication
  • Frontend doesn't virtualize large lists (could be 1000+ components)

10. Error Handling Inconsistency:

  • Some places use assert for validation (good, fail-fast)
  • Others use try/catch with graceful degradation (bad for research code)
  • Pick one approach: Research code should fail fast, not silently degrade

🔨 Complete Rewrite Suggestions

Alternative 1: Event-Driven Architecture
Replace current request/response with events:

# Commands
@dataclass
class ComputeAttributionsCommand:
    prompt_id: int
    optimization_params: OptimParams | None

# Events  
@dataclass
class AttributionsComputed:
    prompt_id: int
    graph_data: GraphData

Alternative 2: Functional Core + Imperative Shell
Move all computation to pure functions, keep I/O at edges:

def compute_attributions(model: Model, tokens: Tensor, config: Config) -> Graph:
    # Pure function - no side effects, easy to test
    pass

class AttributionService:
    # Imperative shell - handles caching, DB, progress callbacks
    pass

🚧 Breaking Changes Needed

The current architecture is trying to be too general. Consider:

  1. Remove the three graph types - just have "computed graphs" with optional optimization metadata
  2. Eliminate the database - store graphs as JSON files, much simpler
  3. Split the mega-components - PromptAttributionsTab should be 4-5 smaller components

Bottom Line: This codebase is evolving from research prototype to production-like complexity without the discipline. Either embrace the complexity with proper abstractions, or ruthlessly simplify back to research-code simplicity.

The current middle ground is the worst of both worlds.


@danbraunai-goodfire danbraunai-goodfire changed the title App updates and backwards transformer ci functions (dev -> main) App updates and backwards transformer ci functions Feb 6, 2026
ocg-goodfire and others added 3 commits February 6, 2026 14:19
* Add batch_invariant scope to persistent PGD

Replace PersistentPGDMaskScope string literal with discriminated union
(SingleMaskScope, BroadcastAcrossBatchScope, BatchInvariantScope).
BatchInvariantScope creates (N, S, C) masks where N divides both
train and eval batch sizes, repeated along batch dim at forward time.

Removes unique_per_batch_per_token (was eval-incompatible). Includes
backwards compat for bare string scope values in configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* Remove commented-out _elemwise_max_component_mask

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* Simplify PersistentPGDState: take seq_len instead of train_batch_dims

The batch size validation already happens in the Config validator,
so the constructor only needs seq_len for mask shape.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* Remove unnecessary .contiguous() and stale comment in get_mask_infos

The in-place mask updates in step() happen after autograd.grad and
backward() have already consumed the graph, so views are fine.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

---------

Co-authored-by: Claude SPD1 <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
When None, transitions use a linear layer (d_resid -> d_resid)
instead of an MLP (d_resid -> hidden -> d_resid with GeLU).
This allows loading older configs that don't have the field.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
danbraunai-goodfire and others added 30 commits February 19, 2026 18:07
…ge (#408)

- Update 7 YAML files: bare `lr:` → `lr_schedule:` block in PGD optimizer configs
- Update 5 YAML files: deprecated scope names (`batch_invariant` → `repeat_across_batch`,
  `unique_per_batch_per_token` → `per_batch_per_position`, `n_masks` → `n_sources`)
- Remove redundant `coeff: null` from eval_metric_configs across 16 YAML files
- Fix misleading error message in persistent_pgd.py (said "use fewer ranks" but
  fewer ranks makes the problem worse)

Co-authored-by: Claude Opus 4.6 <[email protected]>
Refactor hidden activations reconstruction into a unified module with three
variants (Stochastic, CI, PGD). Add "output" cache type to ComponentModel
for capturing post-weight activations. Remove unused pre_weight_acts param
from compute_losses.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Remove PRAGMA journal_mode=WAL from all 3 DB classes (harvest, autointerp,
  topological_interp). WAL requires POSIX file locking which breaks on NFS.
- Scoring scripts (detection, fuzzing) now accept a separate writable InterpDB
  instead of writing through the readonly InterpRepo.
- Intruder eval opens harvest readonly + separate writable HarvestDB for scores.
- Fix try/except → try/finally in interpret.py for proper connection cleanup.
- Bump autointerp/eval/intruder jobs to 2 GPUs for memory headroom.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Was serializing AttributionsSlurmConfig (with n_gpus, partition, etc.)
instead of just DatasetAttributionConfig to workers.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
* Add CI and PGD variants of hidden acts recon loss

Refactor hidden activations reconstruction into a unified module with three
variants (Stochastic, CI, PGD). Add "output" cache type to ComponentModel
for capturing post-weight activations. Remove unused pre_weight_acts param
from compute_losses.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* Cleanup sum

* NOT-REVIEWED: Use existing sources for hidden act loss

* Fix typo: PGDHiddenActsReconLoss → PPGDHiddenActsReconLoss in 4L config

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* PPGDEvalLosses to encompass hidden acts and output recon

* Inline _accumulate_into_state()

* Run PPGD evals by default if exists in loss metrics

* Go back to config-based eval structure

* Use PersistentPGDReconSubsetEval and PersistentPGDReconEval

* Remove unused registry entry

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
…ts' into dev

# Conflicts:
#	tests/metrics/test_recon_losses.py
* add option to set a separate seed for dataset preparation and for the rest of the decomposition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants