HDB Resale Price Prediction (ML + Deep Learning)

End-to-end machine learning and deep learning pipeline for forecasting Singapore HDB resale prices using structured tabular and time-aware features.

This project is designed to showcase:

data acquisition and preprocessing at scale,
exploratory data analysis and feature engineering,
sequence modeling with LSTM and GRU,
hyperparameter tuning and artifact management,
practical model comparison for real-world price prediction.

Project Goals

Build a robust dataset from multiple public data sources related to HDB resale transactions and contextual features.
Compare deep learning architectures (LSTM vs GRU) for predictive performance.
Track preprocessing artifacts (scalers, vocabulary mappings, tuned model files) for reproducibility.
Present a clear, portfolio-ready workflow from raw data to trained model outputs.

Repository Structure

.
├─ data/
│  ├─ raw/                         # Original source datasets
│  └─ processed/                   # Engineered/cleaned datasets used for modeling
├─ models/
│  ├─ lstm/                        # Saved LSTM model + preprocessing artifacts
│  └─ gru/                         # Saved GRU model + preprocessing artifacts + tuner logs
├─ notebooks/
│  ├─ 01_data_preprocessing_and_eda.ipynb
│  ├─ 02_lstm_baseline.ipynb
│  ├─ 03_lstm_hyperparameter_tuning.ipynb
│  ├─ 04_gru_baseline.ipynb
│  └─ 05_gru_hyperparameter_tuning.ipynb
└─ README.md

Data Assets

Raw Data (`data/raw`)

HDB resale transaction datasets (multiple time ranges)
HDB property information
School information
Train station reference data

Processed Data (`data/processed`)

df_final.csv: merged feature table used in downstream workflows
df_final_cleaned.csv: cleaned/transformed version for training

Modeling Workflow

1) Data Pre-analysis, Preprocessing, and EDA

Notebook: notebooks/01_data_preprocessing_and_eda.ipynb

Typical tasks covered:

schema checks and missing-value profiling,
dataset joining and consistency verification,
exploratory visual analysis of price behavior,
preprocessing decisions for model-ready inputs.

2) Baseline Modeling

Notebooks:

notebooks/02_lstm_baseline.ipynb
notebooks/04_gru_baseline.ipynb

Baseline model notebooks establish reference performance before tuning and provide an initial architecture comparison.

3) Hyperparameter Optimization

Notebooks:

notebooks/03_lstm_hyperparameter_tuning.ipynb
notebooks/05_gru_hyperparameter_tuning.ipynb

Tuning notebooks refine architecture/training parameters and save best-performing artifacts.

Saved Artifacts

The models/ directory stores reusable assets for inference and reproducibility:

trained model files (.keras),
feature and target scalers (.pkl),
categorical vocabularies (.json),
best hyperparameter configurations (.json),
tuner search history (for GRU experiments).

Tech Stack

Python
Pandas, NumPy
Scikit-learn
TensorFlow / Keras
Matplotlib, Seaborn

How To Run

Recommended: execute notebooks from the project root so relative paths are easier to manage.

Create and activate a virtual environment.

Install dependencies:

pip install pandas numpy scikit-learn tensorflow matplotlib seaborn jupyter

Launch Jupyter:
```
jupyter notebook
```
Run notebooks in this order:
- 01_data_preprocessing_and_eda.ipynb
- 02_lstm_baseline.ipynb
- 03_lstm_hyperparameter_tuning.ipynb
- 04_gru_baseline.ipynb
- 05_gru_hyperparameter_tuning.ipynb

Author

This project is intended as a practical demonstration of applied machine learning and deep learning capability in a real estate pricing context.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
images		images
models		models
notebooks		notebooks
README.md		README.md
report.MD		report.MD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HDB Resale Price Prediction (ML + Deep Learning)

Project Goals

Repository Structure

Data Assets

Raw Data (`data/raw`)

Processed Data (`data/processed`)

Modeling Workflow

1) Data Pre-analysis, Preprocessing, and EDA

2) Baseline Modeling

3) Hyperparameter Optimization

Saved Artifacts

Tech Stack

How To Run

Author

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HDB Resale Price Prediction (ML + Deep Learning)

Project Goals

Repository Structure

Data Assets

Raw Data (data/raw)

Processed Data (data/processed)

Modeling Workflow

1) Data Pre-analysis, Preprocessing, and EDA

2) Baseline Modeling

3) Hyperparameter Optimization

Saved Artifacts

Tech Stack

How To Run

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Raw Data (`data/raw`)

Processed Data (`data/processed`)