Final Project ENEL645

Abstract

This study evaluated threshold-based classification and XGBoost for privacy-preserving record linkage (PPRL) of homelessness data. The threshold method, combined with Bloom filters and the Dice coefficient, achieved precision up to 85% and accuracy of 82.6%, but required significant computational resources, making full-scale implementation challenging. XGBoost, enhanced with feature engineering and ADASYN for class balancing, achieved precision and recall above 0.80, with 72.8% overall accuracy, while being more efficient for larger datasets. Threshold methods are suitable for resource-limited settings, while XGBoost provides robust performance where computational capacity allows. These approaches demonstrate the potential for unifying fragmented homelessness data, improving policy-making and resource allocation while maintaining privacy.

Repository Structure

Data

Contains synthetic datasets used in this study for PPRL experiments.

Codes

Contains codes used for:

PrivacyPreserving_and_Threshold.ipynb: Preprocessing data, applying Bloom filters, and implementing a threshold-based method for 500 subjects.
3000threshold.py: Same pipeline as the code above, but implemented for 3,000 subjects (the maximum allowed by the computational resources available in TALC).
3000threshold.slurm: Bash code necessary to run 3000threshold.py in TALC.
3000threshold_31254.out: Output file generated by 3000threshold.py, it can be seen completely in output_updated_metrics_3000threshold.csv.
Final_ML_XGBoost.ipynb: Preprocessing data, training and evaluating an XGBoost machine learning model.

PDFs

Contains the final project report and presentation.

Key Contributions

Threshold-based PPRL: Demonstrates the effectiveness of Bloom filters combined with the Dice coefficient for resource-constrained environments.
XGBoost for PPRL: Highlights the potential of machine learning in achieving efficient and accurate record linkage in privacy-sensitive contexts.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
codes		codes
data		data
FelipeCastano_FinalPresentation_645.pdf		FelipeCastano_FinalPresentation_645.pdf
FelipeCastano_FinalReport_645.pdf		FelipeCastano_FinalReport_645.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project ENEL645

Abstract

Repository Structure

Data

Codes

PDFs

Key Contributions

About

Releases

Packages

Languages

Felipecastanog/final_project_ENEL645

Folders and files

Latest commit

History

Repository files navigation

Final Project ENEL645

Abstract

Repository Structure

Data

Codes

PDFs

Key Contributions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages