Implement correct overfitting analysis with exponential DGP and no intercept by Copilot · Pull Request #10 · gsaco/High_Dimensional_Linear_Models

Copilot · 2025-09-05T15:26:39Z

This PR implements the overfitting analysis as specified in the problem statement, replacing previous implementations with the correct data generating process and model specifications.

Key Changes

Data Generating Process: Implemented the specified DGP y = np.exp(4 * W) + e without intercept, where:

W ~ Uniform(0,1), sorted, n=1000 observations
e ~ Normal(0,1)
Uses seed=42 for reproducibility

Model Estimation: Uses LinearRegression(fit_intercept=False) to exclude intercept from polynomial regression models as required.

Feature Engineering: Creates polynomial features W¹, W², W³, ..., Wᵏ for k ∈ {1, 2, 5, 10, 20, 50, 100, 200, 500, 1000}.

Metrics Calculation: For each model complexity:

R² on full sample (in-sample fit quality)
Adjusted R² on full sample (complexity-penalized fit)
Out-of-sample R² using 75%/25% train/test split (generalization performance)

Visualization: Three separate plots showing each R² metric versus number of features, demonstrating:

R² (full sample): Monotonic increase with model complexity
Adjusted R²: Peaks at optimal complexity (~20 features) then declines
Out-of-sample R²: Classic inverted U-shape showing overfitting at high complexity

Results

The analysis successfully demonstrates the bias-variance tradeoff:

Simple models (1-2 features) underfit the exponential relationship
Moderate complexity (5-50 features) captures the pattern well
High complexity (500+ features) severely overfits, with out-of-sample R² becoming negative

Optimal model complexity: 20 features by both Adjusted R² (0.9949) and out-of-sample R² (0.9959) criteria.

File Changes

Replaced Python/scripts/part2_overfitting.ipynb with correct implementation
Removed duplicate/incorrect notebooks: part2_overfitting.py, part2_overfitting_corrected.ipynb, part2_overfitting_corrected_new.ipynb
Cleaned repository of unnecessary files as requested

The notebook is fully functional, tested, and produces the expected overfitting patterns demonstrating the fundamental machine learning concept of model complexity versus generalization performance.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…ercept) Co-authored-by: gsaco <[email protected]>

Initial plan

ed05c17

Copilot AI assigned Copilot and gsaco Sep 5, 2025

Copilot started work on behalf of gsaco September 5, 2025 15:26 View session

Implement complete overfitting analysis with y = exp(4*W) + e (no int…

d7ea065

…ercept) Co-authored-by: gsaco <[email protected]>

Copilot AI requested a review from gsaco September 5, 2025 15:34

Copilot finished work on behalf of gsaco September 5, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement correct overfitting analysis with exponential DGP and no intercept#10

Implement correct overfitting analysis with exponential DGP and no intercept#10
Copilot wants to merge 2 commits intogabriel-sacofrom
copilot/fix-33ef632e-7111-4dba-9cb6-bbb3ca63d134

Copilot AI commented Sep 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Changes

Results

File Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Sep 5, 2025 •

edited

Loading