Skip to content

pramod-T/Linear-Regression-from-Scratch-Python-NumPy-Pandas-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Linear Regression from Scratch (Python + NumPy + Pandas)

Introduction

Linear Regression is a fundamental machine learning algorithm used to predict continuous values.

It models the relationship between a target variable and one or more features by fitting a straight line (or hyperplane):

ŷ = w₁ x₁ + w₂ x₂ + … + wₙ xₙ

where:

  • ŷ = predicted output
  • xᵢ = input features
  • wᵢ = learned weights

This simple version does not include a bias/intercept to focus on the core gradient descent logic.


Gradient Descent (Training Logic)

We train the model by minimizing Mean Squared Error (MSE):

MSE = (1 / m) Σ₍ᵢ₌₁₎ᵐ (yᵢ - ŷᵢ)²

Step 1 — Find the Gradient

∂MSE / ∂w = (2 / n) · Xᵀ · (X w - y)

This tells how to change weights w to reduce the error.

Step 2 — Update the Weights

w = w - η * gradient

where η is the learning rate.


Project Architecture (Pipeline)

Data Flow of the Program: house_price.csv

    ↓

Load & Clean

(drop non-numeric, handle NaN, convert dtypes)

    ↓

Train-Test Split

(train_test_split_data)

    ↓

Normalize Features

(normalize_features)

    ↓

Gradient Descent Training

(linear_regression_train → gradient_descent)

    ↓

Predict on Test Data

(linear_regression_predict)

    ↓

Evaluate with MSE

(mean_squared_error)


How the Program Works

1. Data Preparation

  • Load CSV using pd.read_csv()
  • Drop non-numeric columns and the Id column
  • Drop missing values
  • Convert specific columns to integer type

2. Splitting the Dataset

  • train_test_split_data(df, test_size, random_seed)
    • Splits data into train (80%) and test (20%)
    • Returns: X_train, X_test, y_train, y_test

3. Normalization

  • normalize_features(features)
    • Applies Z-score normalization:

x' = (x - μ) / σ

where μ is the mean and σ is the standard deviation.


4. Model Training

  • linear_regression_train(X_train, y_train, learning_rate, epochs)
    • Normalizes data
    • Calls gradient_descent(X, y, learning_rate, epochs)

Core gradient descent update (pseudo-code):

weights = np.zeros(num_features)

for epoch in range(epochs):
    predictions = np.dot(X, weights)
    errors = predictions - y
    gradient = 2 * np.dot(X.T, errors) / len(y)
    weights -= learning_rate * gradient

5. Prediction

  • linear_regression_predict(weights, X_test)

    • Normalizes X_test

    • Predicts using:

    • ŷ = X · w

6. Evaluation

  • mean_squared_error(predictions, targets)

    • Calculates average squared difference between predictions and actual targets

How to Run

python linear_regression.py

It will:

  • Load and clean dataset
  • Train a linear regression model
  • Predict on test data
  • Print final Mean Squared Error (MSE)

Notes

  • No bias term included
  • Features and targets are normalized separately
  • For learning/demo only — not production
  • Learning rate and epochs may need tuning

Concepts Covered

  • Data Cleaning & Preprocessing
  • Train-Test Splitting
  • Z-score Normalization (x' = (x - μ) / σ)
  • Gradient Descent Optimization (∂MSE/∂w = (2/n) Xᵀ (Xw - y))
  • Linear Regression
  • Model Evaluation (MSE)

About

Implementation of Linear Regression with MSE loss From scratch using basic libraries from python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages