Skip to content

Add probing  #9

@atticusg

Description

@atticusg

Add the method train_probes(datasets, target_variables, collect_counterfactuals=True, probe="linear") to intervention_experiment

datasets is a dictionary mapping to counterfactual datasets

target_variables is a list of variables in the causal_model

collect_counterfactuals determines whether activations for counterfactual inputs are collected

probe determines the type of probe used

The method should run harvest features (typically activations, but not necessarily), and then train a probe on each model_units in model_units_list. The returned results should match the format of the results returned from a perform intervention experiment so that the heatmap printing code in other experiments can take either probing results or intervention results as outputs.

Create a file probes.py that creates a standardized framework for probing that is extensible and relies on functions like .train() and .predict()

The pyvene_core.py file has a function _collect_features that can be used to harvest activations. See build_SVD_feature_intervention in intervention_experiment.py for an example of code that uses _collect_features()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions