Understanding of the internal processes of Deep Learning models is crucial for the safety of AI systems. However, dense representations of the modern DL models are hardly interpretable. In particular, one neuron does not correspond to some sense/pattern/“feature” in the latent representation. Moreover, “superposition hypothesis” can explain impossibility to do that.
The methods Oja's Rule, Sanger's Rule, and Independent Component Analysis (ICA) represent sequential stages in data analysis aimed at dimensionality reduction, decorrelation, and the extraction of independent components. All these methods are connected to Principal Component Analysis (PCA), but each has its unique features and applications.
We apply the modern variation of Hebbian Learning, SoftHebb (Moraitis et al. 2022) to train a 1-layer NN, whose output neurons are specific to input patterns. This approach is tightly connected to the Dictionary Learning itself. Additionaly, we formulate a new weight update, where the top-1 neuron radially spreads his positive WTA update around some radius in 2D neuron grid (given some period hyperparam). The proposed weight update provides interpretable location of neurons in the output vector due to the locality of the their learnt patterns/"features" to their regions. We conjecture such induction of local structure into the weight matrix of an NN can be of avail to future research.
Oja's Rule extends the classical Hebbian learning rule by introducing weight normalization to converge towards the first eigenvector of the data covariance matrix. This allows the neuron to approximate the computation of the first principal component (PCA).
The formula for weight updates in Oja's Rule is:
where:
-
$ \eta $ is the learning rate, -
$\mathbf{x} = [x_1, x_2, \dots, x_n]$ is the input vector, -
$\mathbf{w} = [w_1, w_2, \dots, w_n]$ are the weights of the neuron, -
$y = \mathbf{w}^T \mathbf{x}$ is the neuron output.
This method efficiently finds the first eigenvector but cannot extract additional components. This limitation is addressed in Sanger's Rule.
Sanger's Rule is a modification of Oja's Rule that enables the computation of multiple principal components. It introduces a correction term to remove the influence of previously computed components, preserving orthogonality among them.
The formula for Sanger's Rule is:
where:
-
$\mathbf{y} = [y_1, y_2, \dots, y_m]$ are the neuron outputs (projections onto the principal components), -
$j$ is the input neuron index -
$i$ is the output neuron index -
$\sum_{k=1}^j y_k w_{ik}$ is the sum of corrections to eliminate the influence of previous components.
Unlike traditional PCA, which computes the components through eigen decomposition or via SVD, Sanger's Rule enables an online, iterative approach to compute multiple principal components. This is particularly useful when dealing with streaming data or datasets too large to fit into memory
PCA, implemented through Oja's and Sanger's Rule, effectively reduces data dimensionality while preserving the primary variance structure. However, it does not account for non-linear dependencies among variables. ICA, on the other hand, is aimed at uncovering hidden independent sources, making it indispensable for tasks such as signal processing (e.g., separating sound or image sources).
The weight update formula for SoftHebb is given by:
where
Our additional weight update term applies top-1 neurons weight update to the surrounding neurons radially.
2D grid of tokens, output neurons' weights are specific to - TinyStories token embeddings on tinystories
We used a dataset of embeddings from TinyStories-1M (dim=64) model on tinystories dataset.
To obtain our results:
- Run
get_acts_tokens.ipynb
to generate dataset of word embeddings for TinyStories model - Run
NLA-LinearHebb-addSup.ipynb
for CIFAR-10 results. - Run
NLA-LinearHebb-addSup-nlp.ipynb
for LM's token embeddings results.