Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
b93b9d6
Geometric similarity comparison made consistent with other evals and …
leesharkey Sep 16, 2025
cd5fda2
Replaced mean max cosine sim with mean max ABS cosine sim
leesharkey Sep 17, 2025
61d3408
Configs for geom comparison runs
leesharkey Sep 17, 2025
63c85f0
Merge remote-tracking branch 'origin/main' into feature/geom_sim_compar
leesharkey Sep 17, 2025
770a5c5
Minor modifications to make PR-ready
leesharkey Sep 17, 2025
49ba925
Merge remote-tracking branch 'origin/main' into feature/geom_sim_compar
leesharkey Sep 17, 2025
364198e
Update seed to be consistent with other configs again
leesharkey Sep 17, 2025
57c2c76
Cleaned up some comments and other bits
leesharkey Sep 18, 2025
2e7752d
Major update of PR following review: Now implemented as script rather…
leesharkey Sep 18, 2025
4fbf807
Merge remote-tracking branch 'origin/main' into feature/geom_sim_compar
leesharkey Sep 18, 2025
98a6620
Updated registry to delete old obselete experiments
leesharkey Sep 18, 2025
bede346
Merge branch 'main' into feature/geom_sim_compar
leesharkey Sep 18, 2025
acc04f1
Merge branch 'main' into feature/geom_sim_compar
leesharkey Sep 22, 2025
62bd77e
Reorganized compare_models into subdirectory and cleaned up config code
leesharkey Sep 22, 2025
b84814a
Merging
leesharkey Sep 22, 2025
5173a6a
Updated README.md
leesharkey Sep 22, 2025
181cac8
Added some example models to the config
leesharkey Sep 22, 2025
8db7559
Getting rid of newline
leesharkey Sep 22, 2025
0d05f0a
Minor changes to make the PR mergeable
leesharkey Sep 23, 2025
8767194
Merge branch 'main' of https://github.com/goodfire-ai/spd
leesharkey Sep 23, 2025
019eb2d
Merge branch 'main' of https://github.com/goodfire-ai/spd
leesharkey Sep 24, 2025
b935b4c
Merge branch 'main' of https://github.com/goodfire-ai/spd
leesharkey Sep 29, 2025
3d1edeb
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Sep 30, 2025
1dd738d
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 3, 2025
956f3d4
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 5, 2025
f7ad411
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 6, 2025
ade1377
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 7, 2025
08875a9
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 13, 2025
7ca7037
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 22, 2025
cbbdb61
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 22, 2025
267deb6
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Oct 28, 2025
f49e9e0
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Nov 5, 2025
22f7cfc
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Nov 12, 2025
ab5346d
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Nov 14, 2025
7cb528f
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Nov 20, 2025
01d1b6b
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Nov 21, 2025
a78fdc5
Merge branch 'main' of github.com:goodfire-ai/spd
leesharkey Nov 24, 2025
caaa1e0
Add comprehensive Claude Code documentation and checklist
leesharkey Nov 25, 2025
9be1829
Add checklist cues to prevent common omissions
leesharkey Nov 25, 2025
ea696ad
Add PGD CE diff metric
leesharkey Nov 25, 2025
2bd15e7
Add PGDCEDiffConfig and test config for resid_mlp1
leesharkey Nov 25, 2025
6aaf18c
Add PGDCEDiff metric initialization to eval.py
leesharkey Nov 25, 2025
30d59fc
Handle non-LM tasks gracefully in PGDCEDiff
leesharkey Nov 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions CLAUDE_CHECKLIST.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# CLAUDE_CHECKLIST.md - Pre-Submission Checklist

Use this checklist before submitting any code changes to ensure your contribution meets SPD repository standards.

As you work through this checklist, you might notice something and then get distracted when fixing it. You need to restart the checklist again after your fixes. You might therefore want to keep a running list of changes to make, then make them, then start the checklist again for all of them.

## Code Style & Formatting

### Naming
- [ ] **Files & modules**: `snake_case.py`
- [ ] **Functions & variables**: `snake_case`
- [ ] **Classes**: `PascalCase`
- [ ] **Constants**: `UPPERCASE_WITH_UNDERSCORES`
- [ ] **Private functions**: Prefixed with `_`
- [ ] **Abbreviations**: Uppercase (e.g., `CI`, `L0`, `KL`)

### Type Annotations
- [ ] **Used jaxtyping for tensors** - `Float[Tensor, "... C d_in"]` format (runtime checking not yet enabled)
- [ ] **Used PEP 604 unions** - `str | None` NOT `Optional[str]`
- [ ] **Used lowercase generics** - `dict`, `list`, `tuple` NOT `Dict`, `List`, `Tuple`
- [ ] **Avoided redundant annotations** - Don't write `my_thing: Thing = Thing()` or `name: str = "John"`
- [ ] **Type checking passes with no errors** - Run `make type` successfully and fix all issues (uses basedpyright, NOT mypy)

### Comments & Documentation
- [ ] **No obvious comments** - If code is self-explanatory, no comment needed. (Temp comments during development are fine if cleaned up before committing)
- [ ] **Complex logic explained** - Comments focus on "why" not "what"
- [ ] **Google-style docstrings** - Used `Args:`, `Returns:`, `Raises:` sections where needed
- [ ] **Non-obvious information only** - Docstrings don't repeat what's obvious from signature

### Formatting
- [ ] **Ruff formatting applied** - Run `make format` before committing

## Code Quality

### Error Handling (Fail Fast)
- [ ] **Liberal assertions** - Assert all assumptions about data/state
- [ ] **Clear error messages** - Assertions include descriptive messages
- [ ] **Explicit error types** - Use `ValueError`, `NotImplementedError`, `RuntimeError` appropriately
- [ ] **Fail immediately** - Code fails when wrong, doesn't recover silently
- [ ] **Use try-except only for expected errors** - Assertions for invariants/assumptions. Try-except only when errors are expected and handled (e.g., path resolution, file not found)

### Tensor Operations
- [ ] **Used einops by default** - Preferred over raw einsum for clarity
- [ ] **Asserted shapes liberally** - Verify tensor dimensions
- [ ] **Documented complex operations** - Explain non-obvious tensor manipulations

### Design Patterns
- [ ] **Followed existing patterns** - Match architecture style of surrounding code (ABC for interfaces, Protocol for metrics, Pydantic for configs)
- [ ] **Metrics decoupled** - Each metric in its own file within `spd/metrics/` directory. Figures in `spd/figures.py`
- [ ] **Used Pydantic for configs** - Configs are frozen (`frozen=True`) and forbid extras (`extra="forbid"`)
- [ ] **Config paths handled correctly** - If handling paths in configs, support both relative paths and `wandb:` prefix format
- [ ] **New experiments registered** - If adding new experiment, added to `spd/registry.py` with proper structure
- [ ] **Experiment structure followed** - Experiments have `models.py`, `configs.py`, `{task}_decomposition.py` in flat structure

## Testing

- [ ] **Tests written** - Unit tests for new functionality. Regression tests for bug fixes.
- [ ] **Tests run successfully** - Run `make test` (or `make test-all` if relevant)
- [ ] **Test files named correctly** - `test_*.py` format
- [ ] **Test functions named correctly** - `def test_*():` with descriptive names
- [ ] **Slow tests marked** - Used `@pytest.mark.slow` for slow tests
- [ ] **Focus on unit tests** - Not production code (no deployment). Integration tests often too much overhead for research code. Interactive use catches issues at low cost. Add integration tests only if testing complex interactions that can't be validated in units.

## Pre-Commit Checks

- [ ] **Ran `make check`** - Full pre-commit suite passes (format + type check)
- [ ] **No type errors** - basedpyright reports no issues
- [ ] **No lint errors** - ruff reports no issues

## Git & Version Control

### Before Committing
- [ ] **Checked existing patterns** - If adding new files (docs, configs, tests, etc.), looked at similar existing files for formatting/structure conventions to follow
- [ ] **Reviewed every line of the diff** - Understand every change being committed
- [ ] **Only relevant files staged** - Don't commit unrelated changes or all files
- [ ] **No secrets committed** - No `.env`, `credentials.json`, or similar files
- [ ] **Used correct branch name** - Format: `refactor/X`, `feature/Y`, or `fix/Z`

### Commit Message
- [ ] **Explains "what" and "why"** - Not just describing the diff
- [ ] **Clear and descriptive** - Focused on relevant changes
- [ ] **Explains purpose** - Why this change is needed

### Committing
- [ ] **NOT using `--no-verify`** - Almost never appropriate. Pre-commit checks exist for a reason.
- [ ] **Pre-commit hooks run** - Automatically runs `make format` and `make type`
- [ ] **All hooks passed** - No failures from pre-commit checks

## Pull Request (if creating)

### PR Content
- [ ] **Analyzed all changes** - Reviewed git diff and git status before creating PR
- [ ] **Title is clear** - Concise summary of changes
- [ ] **Used PR template** - Filled out all sections in `.github/pull_request_template.md`:
- Description - What changed
- Related Issue - "Closes #XX" format if applicable
- Motivation and Context - Why needed
- Testing - How tested
- Breaking Changes - Listed if any

### PR Quality
- [ ] **All CI checks pass** - GitHub Actions successful
- [ ] **Merged latest from main** - Branch is up to date
- [ ] **Only relevant files** - No unrelated changes included
- [ ] **Self-reviewed** - Went through diff yourself first

## Cluster Usage (if applicable)

If running experiments on the cluster:
- [ ] **NOT exceeding 8 GPUs total** - Including all sweeps/evals combined
- [ ] **Monitored jobs** - Used `squeue` to check current usage
- [ ] **Used appropriate resources** - GPU vs CPU flags set correctly

## Final Self-Review

- [ ] **Restarted checklist after any changes** - If you made ANY changes while going through this checklist, you MUST restart from the beginning. Did you restart? If not, STOP and restart now.
- [ ] **Code is simple** - Straightforward for researchers with varying experience
- [ ] **No over-engineering** - Only made changes directly requested or clearly necessary
- [ ] **No unnecessary features** - Didn't add extra functionality beyond the task
- [ ] **No premature abstraction** - Didn't create helpers/utilities for one-time operations
- [ ] **No backwards-compatibility hacks** - Removed unused code completely instead of commenting
- [ ] **Followed fail-fast principle** - Code fails immediately when assumptions violated
- [ ] **Type safety maintained** - All functions properly typed
- [ ] **Tests are sufficient** - Core functionality tested, not over-tested

## Common Mistakes to Avoid

- ❌ Forgetting to remove obvious comments like `# get dataloader`
- ❌ Committing without running `make check`
- ❌ Using `--no-verify` flag
- ❌ Recovering silently from errors instead of failing
- ❌ Adding type annotations to obvious assignments like `name: str = "John"`
- ❌ Committing all files instead of only relevant changes
- ❌ Using more than 8 GPUs on cluster (total across all jobs)
- ❌ Failing to consult CLAUDE_COMPREHENSIVE.md for clarification in cases where the checklist is unclear.
- ❌ Starting this checklist, noticing an issue, fixing it, and then forgetting to start the checklist **from the start** again.
Loading