End-to-end machine learning and deep learning pipeline for forecasting Singapore HDB resale prices using structured tabular and time-aware features.
This project is designed to showcase:
- data acquisition and preprocessing at scale,
- exploratory data analysis and feature engineering,
- sequence modeling with
LSTMandGRU, - hyperparameter tuning and artifact management,
- practical model comparison for real-world price prediction.
- Build a robust dataset from multiple public data sources related to HDB resale transactions and contextual features.
- Compare deep learning architectures (
LSTMvsGRU) for predictive performance. - Track preprocessing artifacts (scalers, vocabulary mappings, tuned model files) for reproducibility.
- Present a clear, portfolio-ready workflow from raw data to trained model outputs.
.
├─ data/
│ ├─ raw/ # Original source datasets
│ └─ processed/ # Engineered/cleaned datasets used for modeling
├─ models/
│ ├─ lstm/ # Saved LSTM model + preprocessing artifacts
│ └─ gru/ # Saved GRU model + preprocessing artifacts + tuner logs
├─ notebooks/
│ ├─ 01_data_preprocessing_and_eda.ipynb
│ ├─ 02_lstm_baseline.ipynb
│ ├─ 03_lstm_hyperparameter_tuning.ipynb
│ ├─ 04_gru_baseline.ipynb
│ └─ 05_gru_hyperparameter_tuning.ipynb
└─ README.md
- HDB resale transaction datasets (multiple time ranges)
- HDB property information
- School information
- Train station reference data
df_final.csv: merged feature table used in downstream workflowsdf_final_cleaned.csv: cleaned/transformed version for training
Notebook: notebooks/01_data_preprocessing_and_eda.ipynb
Typical tasks covered:
- schema checks and missing-value profiling,
- dataset joining and consistency verification,
- exploratory visual analysis of price behavior,
- preprocessing decisions for model-ready inputs.
Notebooks:
notebooks/02_lstm_baseline.ipynbnotebooks/04_gru_baseline.ipynb
Baseline model notebooks establish reference performance before tuning and provide an initial architecture comparison.
Notebooks:
notebooks/03_lstm_hyperparameter_tuning.ipynbnotebooks/05_gru_hyperparameter_tuning.ipynb
Tuning notebooks refine architecture/training parameters and save best-performing artifacts.
The models/ directory stores reusable assets for inference and reproducibility:
- trained model files (
.keras), - feature and target scalers (
.pkl), - categorical vocabularies (
.json), - best hyperparameter configurations (
.json), - tuner search history (for GRU experiments).
- Python
- Pandas, NumPy
- Scikit-learn
- TensorFlow / Keras
- Matplotlib, Seaborn
Recommended: execute notebooks from the project root so relative paths are easier to manage.
- Create and activate a virtual environment.
- Install dependencies:
pip install pandas numpy scikit-learn tensorflow matplotlib seaborn jupyter
- Launch Jupyter:
jupyter notebook
- Run notebooks in this order:
01_data_preprocessing_and_eda.ipynb02_lstm_baseline.ipynb03_lstm_hyperparameter_tuning.ipynb04_gru_baseline.ipynb05_gru_hyperparameter_tuning.ipynb
This project is intended as a practical demonstration of applied machine learning and deep learning capability in a real estate pricing context.