Implement correct overfitting analysis with exponential DGP and no intercept#10
Draft
Copilot wants to merge 2 commits intogabriel-sacofrom
Draft
Implement correct overfitting analysis with exponential DGP and no intercept#10Copilot wants to merge 2 commits intogabriel-sacofrom
Copilot wants to merge 2 commits intogabriel-sacofrom
Conversation
…ercept) Co-authored-by: gsaco <[email protected]>
Copilot
AI
changed the title
[WIP] Usa estas librerias: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_scor...
Implement correct overfitting analysis with exponential DGP and no intercept
Sep 5, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the overfitting analysis as specified in the problem statement, replacing previous implementations with the correct data generating process and model specifications.
Key Changes
Data Generating Process: Implemented the specified DGP
y = np.exp(4 * W) + ewithout intercept, where:Model Estimation: Uses
LinearRegression(fit_intercept=False)to exclude intercept from polynomial regression models as required.Feature Engineering: Creates polynomial features W¹, W², W³, ..., Wᵏ for k ∈ {1, 2, 5, 10, 20, 50, 100, 200, 500, 1000}.
Metrics Calculation: For each model complexity:
Visualization: Three separate plots showing each R² metric versus number of features, demonstrating:
Results
The analysis successfully demonstrates the bias-variance tradeoff:
Optimal model complexity: 20 features by both Adjusted R² (0.9949) and out-of-sample R² (0.9959) criteria.
File Changes
Python/scripts/part2_overfitting.ipynbwith correct implementationpart2_overfitting.py,part2_overfitting_corrected.ipynb,part2_overfitting_corrected_new.ipynbThe notebook is fully functional, tested, and produces the expected overfitting patterns demonstrating the fundamental machine learning concept of model complexity versus generalization performance.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.