-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Add the method train_probes(datasets, target_variables, collect_counterfactuals=True, probe="linear") to intervention_experiment
datasets is a dictionary mapping to counterfactual datasets
target_variables is a list of variables in the causal_model
collect_counterfactuals determines whether activations for counterfactual inputs are collected
probe determines the type of probe used
The method should run harvest features (typically activations, but not necessarily), and then train a probe on each model_units in model_units_list. The returned results should match the format of the results returned from a perform intervention experiment so that the heatmap printing code in other experiments can take either probing results or intervention results as outputs.
Create a file probes.py that creates a standardized framework for probing that is extensible and relies on functions like .train() and .predict()
The pyvene_core.py file has a function _collect_features that can be used to harvest activations. See build_SVD_feature_intervention in intervention_experiment.py for an example of code that uses _collect_features()