Skip to content

trungtv1207/HMKT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local Linear Recovery Guarantee – Implementation

PyTorch implementation of "Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization" (arXiv:2406.18035).

Overview

This project investigates whether deep neural networks (DNNs) can reliably recover target functions at overparameterization. We introduce Local Linear Recovery (LLR) — a theoretical framework demonstrating that functions expressible by narrower DNNs are recoverable from fewer samples than model parameters.

The implementation reproduces key experiments showing sample efficiency comparisons across:

  • Fully-connected (FC) models
  • Convolutional (CNN) models with weight sharing (WS)
  • Convolutional (CNN) models without weight sharing (NoWS)

Project Structure

HMTK/
├── src
    ├── models.py              # Model architectures (FC, CNN-WS, CNN-NoWS)
    ├── data_gen.py            # Data generation and teacher network
    ├── train.py               # Training loop and experiment orchestration
    ├── utils.py               # Utilities (training, evaluation, plotting)
    ├── run_experiments.py     # Full experiment runner with aggregation
    ├── config.yaml            # Hyperparameter configuration
├── test
    ├── test_run.py
    ├── test_quick_experiment.py
    ├── test_config.py
    ├── test_all_nodes.py
    ├── final_test.py
├── requirements.txt       # Python dependencies
├── ... (.gitignore, FIXES_APPLIED.md)
└── README.md              # This file

Setup

Prerequisites

  • Python 3.8+
  • GPU recommended (but CPU works for small experiments)

Installation

  1. Clone or download this repository
  2. Install dependencies:
pip install -r requirements.txt

Quick Start

Run a Single Experiment

python train.py --config config.yaml

This trains a single model configuration as specified in config.yaml.

Run Full Sweep

To run experiments across multiple sample sizes and architectures, set sweep.enabled: true in config.yaml and run:

python run_experiments.py

This will:

  • Train all three model types (MLP, CNN-WS, CNN-NoWS)
  • Test across multiple sample sizes
  • Aggregate results with error bars
  • Save results to outputs/results.json
  • Display comparison plots

Verify Installation

python test_run.py

This runs quick sanity checks on models, training, and sweep functionality.

Configuration

Edit config.yaml to customize experiments:

experiment:
  name: "experiment_name"
  seed: 0
  device: "cpu"  # or "cuda"

data:
  input_dim: 5          # Input dimension
  train_size: 20        # Training set size
  test_size: 1000       # Test set size

model:
  type: "mlp"           # Options: mlp | cnn_shared | cnn_noshare

training:
  lr: 0.1               # Learning rate
  max_steps: 20000      # Maximum training steps
  tol: 1e-10            # Convergence tolerance
  optimizer: "sgd"      # Options: sgd | adam
  weight_decay: 0.0

init:
  std: 1e-10            # Tiny initialization std

sweep:
  enabled: true
  sample_sizes: [5, 10, 20, 50, 100, 200]  # Sweep over these sizes

Key Components

Models (models.py)

  1. PaperTeacherMLP: Fixed synthetic teacher network

    • 2-layer tanh network: $f^*(x) = W_2 \tanh(W_1 x + b_1) + b_2$
    • Used to generate ground-truth labels
  2. TwoLayerTanhMLP: Student MLP (fully-connected)

    • Architecture: Linear → tanh → Linear
    • For FC experiments
  3. SharedTanhConv1D: 1D CNN with weight sharing

    • Conv1d → tanh → flatten → Linear
    • For CNN experiments with weight sharing
  4. NoShareTanhConv1D: 1D CNN without weight sharing

    • LocallyConnected1D → tanh → flatten → Linear
    • For CNN experiments without weight sharing

Data Generation (data_gen.py)

  • Generates synthetic regression data: $x \sim \mathcal{N}(0, I)$, $y = f^*(x)$
  • Fixed teacher ensures reproducibility
  • Train/test splits with separate seeds

Training (train.py, utils.py)

  • Full-batch gradient descent (standard in optimization theory)
  • Configurable learning rate and convergence criteria
  • Tracks both training and test MSE
  • Supports SGD and Adam optimizers

Experiments (run_experiments.py)

  • Sweeps across sample sizes: $n \in {5, 10, 20, 50, 100, 200}$
  • Tests multiple seeds for variance estimation
  • Aggregates results with mean ± std error
  • Generates comparison plots

Expected Results

The theory predicts LLR is achievable when the number of training samples $n$ exceeds the effective dimension of the target function. For 2-layer tanh networks:

  • Small $n$: Higher test error (underfitting)
  • Large $n$: Low test error (successful recovery)
  • CNN-WS vs CNN-NoWS: Weight sharing should require fewer samples due to inductive bias

Typical plot shows test MSE decreasing as $n$ increases, with faster decay for constrained models (CNN-WS).

About

Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization - Code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages