PyTorch implementation of "Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization" (arXiv:2406.18035).
This project investigates whether deep neural networks (DNNs) can reliably recover target functions at overparameterization. We introduce Local Linear Recovery (LLR) — a theoretical framework demonstrating that functions expressible by narrower DNNs are recoverable from fewer samples than model parameters.
The implementation reproduces key experiments showing sample efficiency comparisons across:
- Fully-connected (FC) models
- Convolutional (CNN) models with weight sharing (WS)
- Convolutional (CNN) models without weight sharing (NoWS)
HMTK/
├── src
├── models.py # Model architectures (FC, CNN-WS, CNN-NoWS)
├── data_gen.py # Data generation and teacher network
├── train.py # Training loop and experiment orchestration
├── utils.py # Utilities (training, evaluation, plotting)
├── run_experiments.py # Full experiment runner with aggregation
├── config.yaml # Hyperparameter configuration
├── test
├── test_run.py
├── test_quick_experiment.py
├── test_config.py
├── test_all_nodes.py
├── final_test.py
├── requirements.txt # Python dependencies
├── ... (.gitignore, FIXES_APPLIED.md)
└── README.md # This file
- Python 3.8+
- GPU recommended (but CPU works for small experiments)
- Clone or download this repository
- Install dependencies:
pip install -r requirements.txtpython train.py --config config.yamlThis trains a single model configuration as specified in config.yaml.
To run experiments across multiple sample sizes and architectures, set sweep.enabled: true in config.yaml and run:
python run_experiments.pyThis will:
- Train all three model types (MLP, CNN-WS, CNN-NoWS)
- Test across multiple sample sizes
- Aggregate results with error bars
- Save results to
outputs/results.json - Display comparison plots
python test_run.pyThis runs quick sanity checks on models, training, and sweep functionality.
Edit config.yaml to customize experiments:
experiment:
name: "experiment_name"
seed: 0
device: "cpu" # or "cuda"
data:
input_dim: 5 # Input dimension
train_size: 20 # Training set size
test_size: 1000 # Test set size
model:
type: "mlp" # Options: mlp | cnn_shared | cnn_noshare
training:
lr: 0.1 # Learning rate
max_steps: 20000 # Maximum training steps
tol: 1e-10 # Convergence tolerance
optimizer: "sgd" # Options: sgd | adam
weight_decay: 0.0
init:
std: 1e-10 # Tiny initialization std
sweep:
enabled: true
sample_sizes: [5, 10, 20, 50, 100, 200] # Sweep over these sizes-
PaperTeacherMLP: Fixed synthetic teacher network
- 2-layer tanh network:
$f^*(x) = W_2 \tanh(W_1 x + b_1) + b_2$ - Used to generate ground-truth labels
- 2-layer tanh network:
-
TwoLayerTanhMLP: Student MLP (fully-connected)
- Architecture: Linear → tanh → Linear
- For FC experiments
-
SharedTanhConv1D: 1D CNN with weight sharing
- Conv1d → tanh → flatten → Linear
- For CNN experiments with weight sharing
-
NoShareTanhConv1D: 1D CNN without weight sharing
- LocallyConnected1D → tanh → flatten → Linear
- For CNN experiments without weight sharing
- Generates synthetic regression data:
$x \sim \mathcal{N}(0, I)$ ,$y = f^*(x)$ - Fixed teacher ensures reproducibility
- Train/test splits with separate seeds
- Full-batch gradient descent (standard in optimization theory)
- Configurable learning rate and convergence criteria
- Tracks both training and test MSE
- Supports SGD and Adam optimizers
- Sweeps across sample sizes:
$n \in {5, 10, 20, 50, 100, 200}$ - Tests multiple seeds for variance estimation
- Aggregates results with mean ± std error
- Generates comparison plots
The theory predicts LLR is achievable when the number of training samples
-
Small
$n$ : Higher test error (underfitting) -
Large
$n$ : Low test error (successful recovery) - CNN-WS vs CNN-NoWS: Weight sharing should require fewer samples due to inductive bias
Typical plot shows test MSE decreasing as