Local Linear Recovery Guarantee – Implementation

PyTorch implementation of "Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization" (arXiv:2406.18035).

Overview

This project investigates whether deep neural networks (DNNs) can reliably recover target functions at overparameterization. We introduce Local Linear Recovery (LLR) — a theoretical framework demonstrating that functions expressible by narrower DNNs are recoverable from fewer samples than model parameters.

The implementation reproduces key experiments showing sample efficiency comparisons across:

Fully-connected (FC) models
Convolutional (CNN) models with weight sharing (WS)
Convolutional (CNN) models without weight sharing (NoWS)

Project Structure

HMTK/
├── src
    ├── models.py              # Model architectures (FC, CNN-WS, CNN-NoWS)
    ├── data_gen.py            # Data generation and teacher network
    ├── train.py               # Training loop and experiment orchestration
    ├── utils.py               # Utilities (training, evaluation, plotting)
    ├── run_experiments.py     # Full experiment runner with aggregation
    ├── config.yaml            # Hyperparameter configuration
├── test
    ├── test_run.py
    ├── test_quick_experiment.py
    ├── test_config.py
    ├── test_all_nodes.py
    ├── final_test.py
├── requirements.txt       # Python dependencies
├── ... (.gitignore, FIXES_APPLIED.md)
└── README.md              # This file

Setup

Prerequisites

Python 3.8+
GPU recommended (but CPU works for small experiments)

Installation

Clone or download this repository
Install dependencies:

pip install -r requirements.txt

Quick Start

Run a Single Experiment

python train.py --config config.yaml

This trains a single model configuration as specified in config.yaml.

Run Full Sweep

To run experiments across multiple sample sizes and architectures, set sweep.enabled: true in config.yaml and run:

python run_experiments.py

This will:

Train all three model types (MLP, CNN-WS, CNN-NoWS)
Test across multiple sample sizes
Aggregate results with error bars
Save results to outputs/results.json
Display comparison plots

Verify Installation

python test_run.py

This runs quick sanity checks on models, training, and sweep functionality.

Configuration

Edit config.yaml to customize experiments:

experiment:
  name: "experiment_name"
  seed: 0
  device: "cpu"  # or "cuda"

data:
  input_dim: 5          # Input dimension
  train_size: 20        # Training set size
  test_size: 1000       # Test set size

model:
  type: "mlp"           # Options: mlp | cnn_shared | cnn_noshare

training:
  lr: 0.1               # Learning rate
  max_steps: 20000      # Maximum training steps
  tol: 1e-10            # Convergence tolerance
  optimizer: "sgd"      # Options: sgd | adam
  weight_decay: 0.0

init:
  std: 1e-10            # Tiny initialization std

sweep:
  enabled: true
  sample_sizes: [5, 10, 20, 50, 100, 200]  # Sweep over these sizes

Key Components

Models (`models.py`)

PaperTeacherMLP: Fixed synthetic teacher network
- 2-layer tanh network: $f^*(x) = W_2 \tanh(W_1 x + b_1) + b_2$
- Used to generate ground-truth labels
TwoLayerTanhMLP: Student MLP (fully-connected)
- Architecture: Linear → tanh → Linear
- For FC experiments
SharedTanhConv1D: 1D CNN with weight sharing
- Conv1d → tanh → flatten → Linear
- For CNN experiments with weight sharing
NoShareTanhConv1D: 1D CNN without weight sharing
- LocallyConnected1D → tanh → flatten → Linear
- For CNN experiments without weight sharing

Data Generation (`data_gen.py`)

Generates synthetic regression data: $x \sim \mathcal{N}(0, I)$, $y = f^*(x)$
Fixed teacher ensures reproducibility
Train/test splits with separate seeds

Training (`train.py`, `utils.py`)

Full-batch gradient descent (standard in optimization theory)
Configurable learning rate and convergence criteria
Tracks both training and test MSE
Supports SGD and Adam optimizers

Experiments (`run_experiments.py`)

Sweeps across sample sizes: $n \in {5, 10, 20, 50, 100, 200}$
Tests multiple seeds for variance estimation
Aggregates results with mean ± std error
Generates comparison plots

Expected Results

The theory predicts LLR is achievable when the number of training samples $n$ exceeds the effective dimension of the target function. For 2-layer tanh networks:

Small $n$: Higher test error (underfitting)
Large $n$: Low test error (successful recovery)
CNN-WS vs CNN-NoWS: Weight sharing should require fewer samples due to inductive bias

Typical plot shows test MSE decreasing as $n$ increases, with faster decay for constrained models (CNN-WS).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Linear Recovery Guarantee – Implementation

Overview

Project Structure

Setup

Prerequisites

Installation

Quick Start

Run a Single Experiment

Run Full Sweep

Verify Installation

Configuration

Key Components

Models (`models.py`)

Data Generation (`data_gen.py`)

Training (`train.py`, `utils.py`)

Experiments (`run_experiments.py`)

Expected Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
tests		tests
.gitignore		.gitignore
FIXES_APPLIED.md		FIXES_APPLIED.md
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Local Linear Recovery Guarantee – Implementation

Overview

Project Structure

Setup

Prerequisites

Installation

Quick Start

Run a Single Experiment

Run Full Sweep

Verify Installation

Configuration

Key Components

Models (models.py)

Data Generation (data_gen.py)

Training (train.py, utils.py)

Experiments (run_experiments.py)

Expected Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Models (`models.py`)

Data Generation (`data_gen.py`)

Training (`train.py`, `utils.py`)

Experiments (`run_experiments.py`)

Packages