Linear Regression from Scratch (Python + NumPy + Pandas)

Introduction

Linear Regression is a fundamental machine learning algorithm used to predict continuous values.

It models the relationship between a target variable and one or more features by fitting a straight line (or hyperplane):

ŷ = w₁ x₁ + w₂ x₂ + … + wₙ xₙ

where:

ŷ = predicted output
xᵢ = input features
wᵢ = learned weights

This simple version does not include a bias/intercept to focus on the core gradient descent logic.

Gradient Descent (Training Logic)

We train the model by minimizing Mean Squared Error (MSE):

MSE = (1 / m) Σ₍ᵢ₌₁₎ᵐ (yᵢ - ŷᵢ)²

Step 1 — Find the Gradient

∂MSE / ∂w = (2 / n) · Xᵀ · (X w - y)

This tells how to change weights w to reduce the error.

Step 2 — Update the Weights

w = w - η * gradient

where η is the learning rate.

Project Architecture (Pipeline)

Data Flow of the Program: house_price.csv

↓

Load & Clean

(drop non-numeric, handle NaN, convert dtypes)

↓

Train-Test Split

(train_test_split_data)

↓

Normalize Features

(normalize_features)

↓

Gradient Descent Training

(linear_regression_train → gradient_descent)

↓

Predict on Test Data

(linear_regression_predict)

↓

Evaluate with MSE

(mean_squared_error)

How the Program Works

1. Data Preparation

Load CSV using pd.read_csv()
Drop non-numeric columns and the Id column
Drop missing values
Convert specific columns to integer type

2. Splitting the Dataset

train_test_split_data(df, test_size, random_seed)
- Splits data into train (80%) and test (20%)
- Returns: X_train, X_test, y_train, y_test

3. Normalization

normalize_features(features)
- Applies Z-score normalization:

x' = (x - μ) / σ

where μ is the mean and σ is the standard deviation.

4. Model Training

linear_regression_train(X_train, y_train, learning_rate, epochs)
- Normalizes data
- Calls gradient_descent(X, y, learning_rate, epochs)

Core gradient descent update (pseudo-code):

weights = np.zeros(num_features)

for epoch in range(epochs):
    predictions = np.dot(X, weights)
    errors = predictions - y
    gradient = 2 * np.dot(X.T, errors) / len(y)
    weights -= learning_rate * gradient

5. Prediction

linear_regression_predict(weights, X_test)
- Normalizes X_test
- Predicts using:
- ŷ = X · w

6. Evaluation

mean_squared_error(predictions, targets)
- Calculates average squared difference between predictions and actual targets

How to Run

python linear_regression.py

It will:

Load and clean dataset
Train a linear regression model
Predict on test data
Print final Mean Squared Error (MSE)

Notes

No bias term included
Features and targets are normalized separately
For learning/demo only — not production
Learning rate and epochs may need tuning

Concepts Covered

Data Cleaning & Preprocessing
Train-Test Splitting
Z-score Normalization (x' = (x - μ) / σ)
Gradient Descent Optimization (∂MSE/∂w = (2/n) Xᵀ (Xw - y))
Linear Regression
Model Evaluation (MSE)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Linear_Regression.py		Linear_Regression.py
README.md		README.md
house_price.csv		house_price.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear Regression from Scratch (Python + NumPy + Pandas)

Introduction

Gradient Descent (Training Logic)

Step 1 — Find the Gradient

Step 2 — Update the Weights

Project Architecture (Pipeline)

How the Program Works

1. Data Preparation

2. Splitting the Dataset

3. Normalization

4. Model Training

5. Prediction

6. Evaluation

How to Run

Notes

Concepts Covered

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Linear Regression from Scratch (Python + NumPy + Pandas)

Introduction

Gradient Descent (Training Logic)

Step 1 — Find the Gradient

Step 2 — Update the Weights

Project Architecture (Pipeline)

How the Program Works

1. Data Preparation

2. Splitting the Dataset

3. Normalization

4. Model Training

5. Prediction

6. Evaluation

How to Run

Notes

Concepts Covered

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages