Skip to content

adamchok/HDB-Price-Forecasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDB Resale Price Prediction (ML + Deep Learning)

End-to-end machine learning and deep learning pipeline for forecasting Singapore HDB resale prices using structured tabular and time-aware features.

This project is designed to showcase:

  • data acquisition and preprocessing at scale,
  • exploratory data analysis and feature engineering,
  • sequence modeling with LSTM and GRU,
  • hyperparameter tuning and artifact management,
  • practical model comparison for real-world price prediction.

Project Goals

  • Build a robust dataset from multiple public data sources related to HDB resale transactions and contextual features.
  • Compare deep learning architectures (LSTM vs GRU) for predictive performance.
  • Track preprocessing artifacts (scalers, vocabulary mappings, tuned model files) for reproducibility.
  • Present a clear, portfolio-ready workflow from raw data to trained model outputs.

Repository Structure

.
├─ data/
│  ├─ raw/                         # Original source datasets
│  └─ processed/                   # Engineered/cleaned datasets used for modeling
├─ models/
│  ├─ lstm/                        # Saved LSTM model + preprocessing artifacts
│  └─ gru/                         # Saved GRU model + preprocessing artifacts + tuner logs
├─ notebooks/
│  ├─ 01_data_preprocessing_and_eda.ipynb
│  ├─ 02_lstm_baseline.ipynb
│  ├─ 03_lstm_hyperparameter_tuning.ipynb
│  ├─ 04_gru_baseline.ipynb
│  └─ 05_gru_hyperparameter_tuning.ipynb
└─ README.md

Data Assets

Raw Data (data/raw)

  • HDB resale transaction datasets (multiple time ranges)
  • HDB property information
  • School information
  • Train station reference data

Processed Data (data/processed)

  • df_final.csv: merged feature table used in downstream workflows
  • df_final_cleaned.csv: cleaned/transformed version for training

Modeling Workflow

1) Data Pre-analysis, Preprocessing, and EDA

Notebook: notebooks/01_data_preprocessing_and_eda.ipynb

Typical tasks covered:

  • schema checks and missing-value profiling,
  • dataset joining and consistency verification,
  • exploratory visual analysis of price behavior,
  • preprocessing decisions for model-ready inputs.

2) Baseline Modeling

Notebooks:

  • notebooks/02_lstm_baseline.ipynb
  • notebooks/04_gru_baseline.ipynb

Baseline model notebooks establish reference performance before tuning and provide an initial architecture comparison.

3) Hyperparameter Optimization

Notebooks:

  • notebooks/03_lstm_hyperparameter_tuning.ipynb
  • notebooks/05_gru_hyperparameter_tuning.ipynb

Tuning notebooks refine architecture/training parameters and save best-performing artifacts.

Saved Artifacts

The models/ directory stores reusable assets for inference and reproducibility:

  • trained model files (.keras),
  • feature and target scalers (.pkl),
  • categorical vocabularies (.json),
  • best hyperparameter configurations (.json),
  • tuner search history (for GRU experiments).

Tech Stack

  • Python
  • Pandas, NumPy
  • Scikit-learn
  • TensorFlow / Keras
  • Matplotlib, Seaborn

How To Run

Recommended: execute notebooks from the project root so relative paths are easier to manage.

  1. Create and activate a virtual environment.
  2. Install dependencies:
    pip install pandas numpy scikit-learn tensorflow matplotlib seaborn jupyter
  3. Launch Jupyter:
    jupyter notebook
  4. Run notebooks in this order:
    • 01_data_preprocessing_and_eda.ipynb
    • 02_lstm_baseline.ipynb
    • 03_lstm_hyperparameter_tuning.ipynb
    • 04_gru_baseline.ipynb
    • 05_gru_hyperparameter_tuning.ipynb

Author

This project is intended as a practical demonstration of applied machine learning and deep learning capability in a real estate pricing context.

About

Deep learning pipeline for Singapore HDB resale price forecasting using GRU/LSTM, geospatial feature engineering, and hyperparameter tuning.

Resources

Stars

Watchers

Forks

Contributors