Merge 3/3: App overhaul + clustering + cleanup#431
Merged
ocg-goodfire merged 21 commits intodevfrom Mar 6, 2026
Merged
Conversation
App Backend: - AppTokenizer: server-side token display - Refactored graph computation, absolute-target attribution edges - SQLite prompt DB on NFS with DELETE journal + fcntl.flock locking - New routers: graph_interp, investigations, MCP, pretrain_info, run_registry, data_sources - Unified InterventionResult, target-sans masking, masked predictions - Spotlight mode, configurable optimization loss (CE/KL/positional) - Removed get_attribution_strength MCP tool (storage method was deleted) App Frontend: - Canvas edges, spotlight mode, 50K edge limit - New: DataSourcesTab, InvestigationsTab, ClustersTab, ModelGraph, DatasetExplorerTab, OptimizationSettings - Design system: CSS variables, token probability coloring - Lazy loading, bulk endpoints, Loadable<T> pattern Clustering: - CUDA support, memory optimizations, Pile model configs Cleanup: - Remove scratch files, CLAUDE.md updates, .gitignore additions Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Delete scripts/{migrate_harvest_data,test_abs_grad_trick,parse_transformer_circuits_post}.py
- Remove spd/app/TODO.md (moved to ~/app-todo-2026-03-04.md for reference)
- Remove hardcoded partition="h200-reserved" in investigations.py
- Narrow bare except Exception to json.JSONDecodeError in investigations.py
- Add exhaustive match default in graph_interp.py (was NameError on unexpected pass_name)
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Matches the signature in #428 (investigate module). Uses DEFAULT_PARTITION_NAME instead of hardcoded string. TODO to remove when investigate module drops the required partition param. make check now passes with 0 errors. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Remove ~87 lines of commented-out _tool_get_component_attributions in mcp.py - Remove unused DEVICE constant + get_device import + stale TODO in prompts.py - Remove unused ActivationContextsGenerationConfig from schemas.py Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Checked SPD_OUT_DIR/app/prompt_attr.db: ci_masked_label_prob, stoch_masked_label_prob, adv_pgd_label_prob all exist. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The CREATE TABLE statement already includes edges_data_abs (and all metric columns). The real DB at SPD_OUT_DIR has all columns present. No legacy DBs without these columns exist. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The prompt DB is no longer disposable — it's shared team state on NFS. Schema changes need manual ALTER TABLE with backups. CREATE TABLE statements are the source of truth. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
MeanKL: single version used F.kl_div(reduction="batchmean") which for [1, seq, vocab] gives sum over all positions. Batched version used .sum(-1).mean(-1) giving mean over positions. These differ by a factor of seq_len. Fixed single version to match batched (mean over positions). search_tokens: was running model forward pass without GPU lock, risking concurrent CUDA ops with graph computation. Added manager.gpu_lock(). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Previously asserted and crashed after the graph was already saved to the DB, leaving an orphaned graph with no base intervention run. Now logs a warning and returns early. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
~112 lines of dead mock data, mock functions, and MOCK_MODE branches. Also remove the cross-router MOCK_MODE import in runs.py. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…S_PER_POS database.py: - Remove ForkedInterventionRunRecord class - Remove forked_intervention_runs table + index from schema - Remove fork cleanup from delete_prompt - Remove save/get/delete_forked_intervention_run methods - Remove unused: delete_graphs_for_prompt, delete_graphs_for_run, delete_intervention_runs_for_graph, get_intervention_run graphs.py: - Import MAX_OUTPUT_NODES_PER_POS from compute.py instead of redefining Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
All CLI entrypoints and pipeline functions now require explicit harvest_subrun_id. Eliminates silent fallback to open_most_recent() which could pick up stale data from a different config. - autointerp run_interpret.py: main() and get_command() require it - autointerp run_slurm.py: submit_autointerp() requires it - autointerp run_slurm_cli.py: CLI requires --harvest_subrun_id - autointerp scoring/run_label_scoring.py: main() and get_command() require it - dataset_attributions config.py: harvest_subrun_id is required on config - dataset_attributions harvest.py: _build_alive_masks requires it - dataset_attributions run_slurm.py: removed harvest_subrun_id param (now in config) - postprocess __init__.py: sets harvest_subrun_id on attr config from harvest result Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Same block-based DSL from graph_interp, canonical location for autointerp strategies to import from too. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Convert compact_skeptical, dual_view, and graph_interp prompt formatters from f-string concatenation to the Md block-based DSL. Extract shared token_pmi_pairs helper into prompt_helpers. Add labeled_list to Md for the common bold-header + bullet-items pattern. Also fix two pre-existing basedpyright warnings: DONE_MARKER import path in graph_interp/repo.py and unused param in test_storage.py. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
4 tasks
…, investigate - graph_interp/db.py: Extract parameterized _save_label/_get_label/_get_all_labels from 3x3 duplicated CRUD methods - graph_interp/interpret.py: Unify process_output_layer/process_input_layer via _make_process_layer factory - autointerp/prompt_helpers.py: Deduplicate build_fires_on_examples/build_says_examples into _build_examples - graph_interp/prompts.py: Simplify _format_related string building with f-string - investigate/agent_prompt.py: Replace repetitive config blocks with data-driven loop - investigate/scripts/run_agent.py: Remove obvious docstrings, simplify fetch_model_info Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…tale docs Backend: - graphs.py: Extract _build_loss_config, _build_loss_result, _maybe_pgd_config, _maybe_adv_pgd helpers - server.py: Move deferred stdlib imports to module-level - __init__.py: Fix __all__ ordering - CLAUDE.md: Remove duplicate router entries - sqlite.py: Fix stale docstring referencing old DB location Frontend components: - Deduplicate getTopEdgeAttributions into shared topEdgeAttributions() in promptAttributionsTypes.ts - Extract generic parseSSEStream<T>() in graphs.ts, eliminating ~50 lines of duplicated SSE parsing - Extract AVAILABILITY_COLUMNS in RunSelector, reducing ~60 lines of duplicated template - Eliminate redundant computeMaxAbsComponentAct in ActivationContextsViewer + ClusterComponentCard - Fix unreachable null check in ClusterComponentCard - Fix mid-file import in ComponentNodeCard - Remove dead fork handler stubs in PromptAttributionsTab - Remove unused isRunEditable export, 5 unused CSS selectors, 12+ unnecessary comments Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…rt both-or-neither Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Depends on: PR2 modules (#427 graph_interp, #428 investigate, #429 editing, #430 postprocess) — merge those first.
Passes
make checkonce all dependencies are merged.App Backend
AppTokenizer: server-side token displayfcntl.flockwrite lockingInterventionResult, target-sans masking, masked predictions (CI/stochastic/adversarial)get_attribution_strengthMCP tool (underlyingget_attributionmethod deleted in PR1)App Frontend
Loadable<T>patternClustering
Cleanup
Review by Claude Code (Opus 4.6).