-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Labels
Priority: InUImportant & not UrgentImportant & not Urgentexp/investigationA research investigation or experimentA research investigation or experimentgood first issueGood for newcomersGood for newcomers
Description
It's interesting that in our "solved" resid_mlp1 and resid_mlp2 toy models, one component has a higher ci_mean_per_component_log than the other active ones.
- resid_mlp1 solved run
- resid_mlp2 solved run. Note, there's a little bit of noise in layers.1.mlp_in, but I think that's unrelated to this more-active component. You can actually see the two stragglers appearing in the ci_mean_per_component_log plot.
This does not occur in TMS. What do these components do? Do they always form when training with various hyperparameters?
If anyone is looking for ways to contribute, this would be a nice thing to investigate. I think you can even train the resid_mlp1 on a cpu pretty quickly, though I haven't tried in a while.

Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Priority: InUImportant & not UrgentImportant & not Urgentexp/investigationA research investigation or experimentA research investigation or experimentgood first issueGood for newcomersGood for newcomers