Skip to content
/ OJa Public

Oja's rule variants for component analysis of data. Final project for course "Numerical Linear Algebra" 2024.

Notifications You must be signed in to change notification settings

Pqlet/OJa

Repository files navigation

Hebbian learning for component analysis

Understanding of the internal processes of Deep Learning models is crucial for the safety of AI systems. However, dense representations of the modern DL models are hardly interpretable. In particular, one neuron does not correspond to some sense/pattern/“feature” in the latent representation. Moreover, “superposition hypothesis” can explain impossibility to do that.

The methods Oja's Rule, Sanger's Rule, and Independent Component Analysis (ICA) represent sequential stages in data analysis aimed at dimensionality reduction, decorrelation, and the extraction of independent components. All these methods are connected to Principal Component Analysis (PCA), but each has its unique features and applications.

We apply the modern variation of Hebbian Learning, SoftHebb (Moraitis et al. 2022) to train a 1-layer NN, whose output neurons are specific to input patterns. This approach is tightly connected to the Dictionary Learning itself. Additionaly, we formulate a new weight update, where the top-1 neuron radially spreads his positive WTA update around some radius in 2D neuron grid (given some period hyperparam). The proposed weight update provides interpretable location of neurons in the output vector due to the locality of the their learnt patterns/"features" to their regions. We conjecture such induction of local structure into the weight matrix of an NN can be of avail to future research.

Oja's Rule: Basics and Formulas

Oja's Rule extends the classical Hebbian learning rule by introducing weight normalization to converge towards the first eigenvector of the data covariance matrix. This allows the neuron to approximate the computation of the first principal component (PCA).

The formula for weight updates in Oja's Rule is:

$$ \Delta \mathbf{w} = \eta y (\mathbf{x} - y \mathbf{w}) $$

where:

  • $ \eta $ is the learning rate,
  • $\mathbf{x} = [x_1, x_2, \dots, x_n]$ is the input vector,
  • $\mathbf{w} = [w_1, w_2, \dots, w_n]$ are the weights of the neuron,
  • $y = \mathbf{w}^T \mathbf{x}$ is the neuron output.

This method efficiently finds the first eigenvector but cannot extract additional components. This limitation is addressed in Sanger's Rule.

Generalized Hebbian Learning or Sanger's Rule: Extending Oja's Rule for PCA

Sanger's Rule is a modification of Oja's Rule that enables the computation of multiple principal components. It introduces a correction term to remove the influence of previously computed components, preserving orthogonality among them.

The formula for Sanger's Rule is:

$$ \Delta w_{ij} = \eta y_i \left( x_j - \sum_{k=1}^i w_{kj} y_k \right), $$

where:

  • $\mathbf{y} = [y_1, y_2, \dots, y_m]$ are the neuron outputs (projections onto the principal components),
  • $j$ is the input neuron index
  • $i$ is the output neuron index
  • $\sum_{k=1}^j y_k w_{ik}$ is the sum of corrections to eliminate the influence of previous components.

Unlike traditional PCA, which computes the components through eigen decomposition or via SVD, Sanger's Rule enables an online, iterative approach to compute multiple principal components. This is particularly useful when dealing with streaming data or datasets too large to fit into memory

Connection to PCA and Advantages

PCA, implemented through Oja's and Sanger's Rule, effectively reduces data dimensionality while preserving the primary variance structure. However, it does not account for non-linear dependencies among variables. ICA, on the other hand, is aimed at uncovering hidden independent sources, making it indispensable for tasks such as signal processing (e.g., separating sound or image sources).

SoftHebb from Moraitis et al. 2022

The weight update formula for SoftHebb is given by:

$$ \Delta w_{ij} = \eta y_i \left(x_j - u_i w_{ij} \right), $$

where $u_i=\mathbf{w_{i,\cdot}}\mathbf{x}$, $y_i=f(\mathbf{w_{i, \cdot}}\mathbf{x})$, $f$-Softmax activation function. The SoftHebb algorithm is not iterative, i.e. it does not include the orthogonalization of the $w_i$ to $w_1, w_2, ..., w_{i-1}$. Here, the specificity of output neurons to input patterns is induced by Anti-Hebbian updates: the non top-1 neurons by activation get $y_i:= -y_i$.

Our additional weight update term applies top-1 neurons weight update to the surrounding neurons radially.

2D grid of plotted output neurons' weights - CIFAR-10

2D grid of tokens, output neurons' weights are specific to - TinyStories token embeddings on tinystories

We used a dataset of embeddings from TinyStories-1M (dim=64) model on tinystories dataset.

Reproducibility (Colab friendly)

To obtain our results:

  1. Run get_acts_tokens.ipynb to generate dataset of word embeddings for TinyStories model
  2. Run NLA-LinearHebb-addSup.ipynb for CIFAR-10 results.
  3. Run NLA-LinearHebb-addSup-nlp.ipynb for LM's token embeddings results.

About

Oja's rule variants for component analysis of data. Final project for course "Numerical Linear Algebra" 2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •