Add probing 

Add the method train_probes(datasets, target_variables, collect_counterfactuals=True, probe="linear") to intervention_experiment 

datasets is a dictionary mapping to counterfactual datasets

target_variables is a list of variables in the causal_model

collect_counterfactuals determines whether activations for counterfactual inputs are collected

probe determines the type of probe used

The method should run harvest features (typically activations, but not necessarily), and then train a probe on each model_units in model_units_list. The returned results should match the format of the results returned from a perform intervention experiment so that the heatmap printing code in other experiments can take either probing results or intervention results as outputs.

Create a file probes.py that creates a standardized framework for probing that is extensible and relies on functions like .train() and .predict()

The pyvene_core.py file has a function _collect_features that can be used to harvest activations. See build_SVD_feature_intervention in intervention_experiment.py for an example of code that uses _collect_features()



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add probing #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add probing #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions