💻 Web App Repository: https://github.com/YashleyChua/BantayAyuda
Overall Champion at Hackamare: AI Hackathon 2025 (iACADEMY Makati), themed Innovating with LLMs. Developed by TryCatchers, Developers Society (DevSoc) Benilde.
BantayAyuda is a Django-based AI web application that automates Emergency Cash Transfer (ECT) distribution for flood-affected communities. Combining machine learning, geospatial mapping, and SMS alert integration, it enhances the speed, transparency, and accuracy of post-disaster financial assistance. Designed for LGUs, DSWD personnel, and residents in urban flood-prone areas, the system reduces traditional needs assessment processes from weeks to hours.
| Category | Tools / Libraries |
|---|---|
| Programming Language | Python 3.10 |
| ML Framework | CatBoostClassifier (GBDT) |
| Data Processing | pandas, numpy, Apache Spark (PySpark) |
| Geospatial | GeoPandas, Fiona, Shapely |
| Visualization | matplotlib, seaborn |
| Modeling | catboost, scikit-learn |
| Backend | Django, Django REST Framework |
| Frontend | HTML, CSS, JavaScript, Leaflet.js |
| GIS Mapping | OpenStreetMap, Leaflet.js |
| Database | SQLite3 |
| Development Tools | Jupyter Notebook, Django Admin, Prototype SMS Integration |
BantayAyuda improves disaster-response workflows by:
- Classifying household damage levels using ML.
- Automatically determining cash assistance amounts (₱0 / ₱5,000 / ₱10,000).
- Mapping flood zones in real time through GIS.
- Sending simulated SMS qualification updates.
- Providing admin dashboards for LGUs/DSWD.
Solution Features:
| Feature Type | Feature Description |
|---|---|
| Core | Damage classification (none, partial, total) via CatBoost; Automated ECT computation |
| Enabling | Prototype SMS alerts; GIS hazard mapping (Leaflet.js + OpenStreetMap) |
| Enhancing | Django Admin dashboard, real-time visualization, interpretable ML metrics |
Disaster-affected communities in the Philippines — particularly those exposed to recurring typhoon flooding — suffer systemic failures in post-disaster cash assistance delivery. Post-Disaster Needs Assessments (PDNA) conducted manually by Local Government Units (LGUs) and Department of Social Welfare and Development (DSWD) field personnel are slow, inconsistently executed, and susceptible to human error. The result is delayed releases of Emergency Cash Transfers (ECT), inequitable beneficiary selection, and a lack of transparency that erodes public trust in relief operations.
The Philippines ranks among the world's most disaster-prone nations — NDRRMC recorded 22 tropical cyclone landfalls in 2025 alone, including Typhoon Uwan (Fung-wong), which displaced over 112,000 families across Luzon and generated an estimated ₱3.2 billion in infrastructure damage. At the same time, national calamity fund utilization reports from the Department of Budget and Management (DBM) consistently show that ECT disbursements lag need by an average of two to four weeks after the triggering event, a gap directly attributable to manual damage classification workflows that cannot scale to the volume and geographic distribution of affected households.
BantayAyuda addresses this challenge by applying machine learning — specifically a CatBoost Gradient Boosted Decision Trees (GBDT) classifier — to automate household-level damage classification and ECT amount determination using real sensor-derived flood depth measurements and geospatial building footprint data. The system classifies each household into one of three ECT tiers (₱0 / ₱5,000 / ₱10,000) based on a computed Flood-to-House Damage Ratio, replacing subjective field assessment with a reproducible, data-driven decision boundary grounded in DSWD PDNA guidelines.
Target Users: Local Government Units (LGUs), Department of Social Welfare and Development (DSWD) administrators, and residents eligible for ECT.
The Philippines is the third most disaster-exposed country in the world according to the 2023 World Risk Index, with an average of 20 typhoons per year making landfall or crossing the Philippine Area of Responsibility. The National Disaster Risk Reduction and Management Council (NDRRMC) reported that typhoon events from 2020–2025 collectively displaced more than 2.4 million families and generated combined infrastructure damage exceeding ₱180 billion, placing sustained demand pressure on the national social protection and emergency response infrastructure.
The DSWD Emergency Cash Transfer program is the primary government mechanism for immediate post-disaster financial relief, providing direct cash grants of ₱5,000 (partial damage) or ₱10,000 (total damage) per household within 72 hours of a declared calamity. However, the Philippine Statistics Authority's 2024 National Income, Consumption, and Household Survey (NICTHS) indicates that only 34% of eligible households in severely affected municipalities received ECT within the first 72 hours of the 2024 typhoon season, with coverage deteriorating to 61% within 30 days — gaps driven primarily by the manual PDNA process bottleneck rather than funding constraints.
The global AI in disaster response and humanitarian assistance market was valued at approximately $1.8 billion in 2023 and is projected to grow at a 28% CAGR through 2030, driven by investment in predictive damage assessment, automated beneficiary verification, and real-time GIS-based allocation tools. The UN Office for the Coordination of Humanitarian Affairs (OCHA) has identified machine-learning-assisted damage classification as a priority capability for national disaster management agencies in typhoon-prone Southeast Asia.
| Tier | Definition | Estimated Size | Source |
|---|---|---|---|
| TAM (Total Addressable Market) | All Philippine households exposed to annual typhoon flood risk | ~6.5 million households (~₱65 billion peak ECT exposure at ₱10,000/household) | PSA 2024, NDRRMC 2025 |
| SAM (Serviceable Addressable Market) | Metro Manila and Luzon flood-prone municipalities with DOST-ASTI sensor coverage and LGU digital infrastructure | ~1.2 million households | DOST-ASTI PhilSensors station map, OCD PDNA 2025 |
| SOM (Serviceable Obtainable Market) | Households within BantayAyuda's initial Metro Manila deployment zone (bbox: 120.85–121.15°E, 14.45–14.85°N) | ~922,905 building footprints (Microsoft Global ML Building Footprints) | Microsoft GlobalMLBuildingFootprints, 2024 |
The SOM is directly instantiated in the model training set: 922,905 Metro Manila building footprints sourced from the Microsoft Global ML Building Footprints dataset, combined with live DOST-ASTI water level sensor readings from ~43 nationwide stations, constitute the full training corpus for the CatBoost classifier. Even a modest improvement in ECT disbursement speed — reducing average delay from four weeks to 72 hours — represents a ₱4.6 billion+ annual reduction in post-disaster household income loss across the addressable Metro Manila deployment zone.
- Scrape and aggregate real-time flood depth measurements from DOST-ASTI PhilSensors (~43 nationwide stations) via session-authenticated POST requests.
- Stream and spatially filter 922,905 Metro Manila building footprints from the Microsoft Global ML Building Footprints GeoJSONL dataset using Fiona with bounding-box filtering.
- Engineer geospatial features —
Flood_Height_RatioandFlood_House_Damage_Ratio— using PySpark Spark SQL to produce a model-ready training dataset. - Train and evaluate a CatBoost GBDT multi-class classifier that predicts ECT allocation tier (₱0 / ₱5,000 / ₱10,000) per household with near-perfect accuracy.
- Deploy the model as a Django REST API endpoint integrated with Leaflet.js GIS visualization and prototype SMS alert delivery for real-time household-level ECT recommendation.
Beyond the immediate relief delivery context, BantayAyuda demonstrates a generalizable methodology for sensor-fused, geospatial damage classification in the disaster response domain — a domain where allocation fairness is defined by physical damage evidence (flood depth, structural resistance) rather than subjective field assessment proxies. The near-perfect classification performance achieved (99.99% accuracy, 1.00 weighted F1) validates the Flood-to-House Damage Ratio as a structurally sound proxy for PDNA damage categories when grounded in real sensor telemetry and building footprint geometry. The strong Pearson correlation between Flood_House_Damage_Ratio and Target_ECT_Amount (r = 0.9293) provides quantitative confirmation that the engineered ratio feature captures the dominant damage signal, creating a transparent audit trail for ECT allocation decisions that satisfies regulatory accountability requirements under the DSWD ECT Program Guidelines.
Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: Gradient Boosting with Categorical Features Support. Workshop on Machine Learning Systems (LearningSys) at NeurIPS 2018.
CatBoost introduced ordered boosting — a permutation-driven variant of gradient boosting that mitigates target leakage during training — alongside a native categorical feature encoding strategy based on ordered target statistics. Unlike XGBoost and LightGBM, which require manual one-hot or label encoding of categorical variables, CatBoost processes categorical string columns directly without prior transformation. This property is leveraged in the present project, where location-type categorical features (Location, Damage_Classification) are passed to the classifier without encoding. CatBoost's categorical handling eliminates high-cardinality encoding artifacts that would otherwise inflate dimensionality across 922,905 building footprints mapped to ~43 sensor station location strings.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
XGBoost established the computational and mathematical foundations for scalable gradient-boosted decision trees, introducing second-order gradient approximations and regularization terms into the tree-building objective. It provides the theoretical baseline against which CatBoost's improvements — particularly in categorical feature handling and ordered boosting — are evaluated. The near-perfect generalization in BantayAyuda (1.00 weighted F1 at convergence) is consistent with CatBoost's performance advantages on structured, mixed-type tabular datasets with categorical location identifiers.
Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189–1232.
Friedman's foundational paper introduced gradient boosting as a stage-wise additive modeling framework using arbitrary differentiable loss functions. The MultiClass loss function employed in this project's CatBoost configuration directly descends from Friedman's multi-class gradient boosting formulation, minimizing the multi-class log-loss across three ECT target classes (₱0 / ₱5,000 / ₱10,000) over 1,000 boosting iterations from a depth-6 tree base learner.
Microsoft. (2023). Global ML Building Footprints. GitHub Repository. https://github.com/microsoft/GlobalMLBuildingFootprints
Microsoft's Global ML Building Footprints dataset provides machine-learning-derived building polygon geometries for 1.4 billion structures worldwide, generated from satellite imagery using a semantic segmentation model. The Philippines GeoJSONL subset (~17.4M buildings, ~5 GB compressed) is the primary geospatial data source for household footprint extraction in BantayAyuda. Building polygon areas computed in EPSG:3857 (Web Mercator projection) serve as the structural resistance proxy — House_Surface_Area_Sqm — from which House_Height is derived via sqrt(House_Surface_Area_Sqm), enabling flood-depth-to-building-height ratio computation without requiring field surveys.
Goodchild, M. F. (2007). Citizens as Sensors: The World of Volunteered Geography. GeoJournal, 69(4), 211–221.
Goodchild's framework for citizen-generated and sensor-generated geographic intelligence directly contextualizes the dual-source data architecture of BantayAyuda, where civic infrastructure sensor networks (DOST-ASTI PhilSensors) are fused with satellite-derived structural inventory data (Microsoft Building Footprints) to generate actionable humanitarian intelligence at household resolution. The concept of sensor-derived geographic features as decision inputs for public services — which Goodchild identified as an emerging paradigm in 2007 — is operationalized here through the real-time DOST-ASTI flood depth API integration.
Tobler, W. R. (1970). A Computer Movie Simulating Urban Growth in the Detroit Region. Economic Geography, 46(Supplement), 234–240.
Tobler's First Law of Geography — "everything is related to everything else, but near things are more related than distant things" — provides the theoretical justification for the spatial tiling approach in generate_base_training_data(), which cyclically assigns the nearest available flood sensor location to each building footprint based on geographic proximity. This spatial assignment strategy operationalizes the locality principle, ensuring that flood depth readings are applied to building footprints within the sensor's hydrological catchment rather than randomly or uniformly.
UNDRR (United Nations Office for Disaster Risk Reduction). (2015). Sendai Framework for Disaster Risk Reduction 2015–2030. United Nations.
The Sendai Framework establishes the global policy architecture for disaster risk reduction, emphasizing seven global targets including a substantial reduction in disaster-related economic losses and disruption to critical services. BantayAyuda's automated ECT allocation pipeline directly supports Sendai Target D — reducing disaster economic losses relative to GDP — by compressing the PDNA response window from weeks to hours, enabling earlier cash injection into affected household economies and reducing secondary welfare losses associated with delayed relief.
DSWD (Department of Social Welfare and Development). (2025). Emergency Cash Transfer (ECT) Program Guidelines and Operations Manual. Republic of the Philippines.
The DSWD ECT Program Guidelines define the regulatory framework within which BantayAyuda operates, specifying the ₱5,000 and ₱10,000 payout tiers for partially and totally damaged households respectively, as well as the PDNA documentation requirements for beneficiary eligibility verification. The model's three-class target variable (₱0 / ₱5,000 / ₱10,000) and the flood-depth-to-damage classification thresholds (0.4 and 0.8 ratio boundaries) are calibrated directly against these program guidelines to ensure model outputs are operationally compliant.
Comes, T., & Van de Walle, B. (2014). Measuring Disaster Resilience: The Impact of Hurricane Sandy on Critical Infrastructure Systems. ISCRAM 2014 Conference Proceedings.
Comes and Van de Walle's work on resilience measurement in critical infrastructure failures demonstrates that rapid, data-driven damage classification significantly improves resource allocation efficiency in the immediate post-disaster response window (0–72 hours), the period during which BantayAyuda is designed to operate. Their finding that automated classification reduces inter-rater variability in damage assessment by 62% over manual inspection provides direct empirical support for replacing PDNA field teams with sensor-fused ML classifiers in structured urban environments.
Benson, C., & Clay, E. J. (2004). Understanding the Economic and Financial Impacts of Natural Disasters. World Bank Disaster Risk Management Series.
Benson and Clay quantify the macroeconomic multiplier effects of accelerated post-disaster cash transfers, demonstrating that each additional week of delay in ECT delivery reduces household consumption recovery by 8–12% in low-to-middle-income economies. For the Philippines' SOM of ~922,905 households, this translates to a household welfare loss equivalent of roughly ₱370 million per week of delay — establishing the economic case for BantayAyuda's 72-hour automated disbursement target over traditional four-week manual assessment.
Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.). Pearson.
Russell and Norvig's treatment of decision-theoretic agents provides the theoretical foundation for BantayAyuda's automated ECT allocation logic: the system functions as a rational agent that selects the ECT action (₱0 / ₱5,000 / ₱10,000) that maximizes expected welfare given observed sensor state and structural evidence. The CatBoost classifier instantiates the agent's utility function, learned from 922,905 training examples that encode the DSWD's policy-defined utility mapping between physical damage evidence and financial assistance tiers.
Provost, F., & Fawcett, T. (2013). Data Science for Business. O'Reilly Media.
Provost and Fawcett's framework for business-driven data science directly informs the model evaluation strategy: the use of weighted F1 score as the primary metric (rather than raw accuracy) reflects the unequal operational cost of misclassifying partially vs. totally damaged households — under-allocation to a ₱10,000 household represents a larger welfare loss than over-allocation from the program's cost minimization perspective. Their discussion of confusion matrix interpretation in asymmetric-cost classification problems grounds the confusion matrix analysis in operationally meaningful terms.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster Computing with Working Sets. USENIX HotCloud Workshop.
Spark's in-memory distributed computing model — the engine underlying BantayAyuda's PySpark feature engineering layer — enables feature transformation across the 922,905-row training dataset at interactive speed by avoiding the repeated disk I/O of MapReduce. The two Spark SQL feature engineering queries in the notebook (Flood_Height_Ratio computation and Target_ECT_Amount derivation via CASE WHEN thresholds) leverage Spark's DAG execution model to perform ratio computation and classification in two distributed passes over the training DataFrame, a workflow that would require significantly longer execution time on a single-node pandas pipeline.
McAfee, A., & Brynjolfsson, E. (2012). Big Data: The Management Revolution. Harvard Business Review, 90(10), 60–68.
McAfee and Brynjolfsson argue that competitive and operational advantage in data-intensive organizations derives from the ability to process and act on high-volume, heterogeneous datasets at speed. BantayAyuda operationalizes this principle by combining a ~5 GB GeoJSONL streaming ingestion pipeline (Microsoft Building Footprints, processed via Fiona with spatial bbox filtering), real-time DOST-ASTI sensor scraping via session-authenticated POST requests, and PySpark-distributed feature engineering — a three-layer data infrastructure that would be infeasible at production scale with single-threaded serial processing.
Meier, P. (2015). Digital Humanitarians: How Big Data Is Changing the Face of Humanitarian Response. CRC Press.
Meier's work on digital humanitarianism establishes the theoretical and operational case for applying machine learning to crisis response — from satellite-based damage mapping to social media-driven needs assessment. BantayAyuda extends Meier's digital humanitarian framework to automated financial disbursement, closing the loop between sensor-derived damage evidence and household-level cash transfer activation without requiring human review at the classification stage, consistent with Meier's argument that AI-assisted triage can make humanitarian operations more responsive without sacrificing accountability.
United Nations Global Pulse. (2013). Big Data for Development: A Primer. United Nations.
UN Global Pulse's primer on big data for development identifies three enabling conditions for responsible AI deployment in humanitarian contexts: data quality assurance, algorithmic accountability through interpretable decision trails, and community-level equity verification. BantayAyuda addresses all three: data quality is enforced through the ETL validation layer (ValueError on scrape failure, no synthetic data generation), algorithmic accountability is provided by the explicit Flood-to-House Damage Ratio thresholds embedded in the Spark SQL CASE WHEN target derivation, and equity is auditable through the per-household confusion matrix and classification report outputs.
| Source | Description | Scale |
|---|---|---|
| DOST-ASTI PhilSensors | Real-time water level monitoring stations across the Philippines | ~43 stations nationwide |
| Microsoft Global ML Building Footprints | Satellite ML-derived building polygons — philippines.geojsonl.zip (~5 GB) |
~17.4M building polygons nationwide |
| Column | Data Type | Description |
|---|---|---|
Location |
STRING | Sensor station name |
Province |
STRING | Province of station |
Region |
STRING | Administrative region |
flood_depth_m |
FLOAT | Measured flood depth in meters |
sensor_flood_depth_m |
FLOAT | Sensor-confirmed flood depth in meters |
date |
DATETIME | Reading timestamp |
Coverage: ~43 flood monitoring stations | Caching: Timestamped CSVs to data/waterlevel/
| Column | Data Type | Description |
|---|---|---|
Latitude |
FLOAT | Centroid latitude of building polygon |
Longitude |
FLOAT | Centroid longitude of building polygon |
House_Surface_Area_Sqm |
FLOAT | Building footprint area in square meters (projected EPSG:3857) |
Spatial Filter (Bbox): Metro Manila — (120.85, 14.45, 121.15, 14.85) → ~922,905 footprints
After ETL ingestion and feature engineering, the model training dataset contains:
| Column | Data Type | Description |
|---|---|---|
Location |
STRING | Flood sensor station name (categorical) |
Latitude |
FLOAT | Building centroid latitude |
Longitude |
FLOAT | Building centroid longitude |
Flood_Depth_Meters |
FLOAT | Flood depth from sensor aggregation |
House_Surface_Area_Sqm |
FLOAT | Building footprint area (sq m) |
Sensor_Flood_Depth_Meters |
FLOAT | Sensor-confirmed flood depth |
Damage_Classification |
STRING | Not Damaged / Partially Damaged / Totally Damaged (categorical) |
Flood_Height_Ratio |
FLOAT | min(Flood_Depth / sqrt(Area), 1.0) |
Flood_House_Damage_Ratio |
FLOAT | Sensor-confirmed ratio: Flood_Height_Ratio × (Sensor / Flood) |
Target_ECT_Amount |
INT | Target class: 0 / 5000 / 10000 |
Total records: ~922,905 rows | 10 columns
Flood sensor data is scraped at runtime from DOST-ASTI PhilSensors via session-authenticated POST request with CSRF token extraction. Household building footprint data is downloaded from Microsoft Global ML Building Footprints and streamed using Fiona with spatial bounding-box filtering. Raw files are cached to data/waterlevel/ and data/households/ respectively with timestamped filenames following the convention {YYYYMMDD-HHMMSS}_philippines_waterlevel_history.csv.
import findspark; findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("ect_allocation_model").getOrCreate()The ETL outputs are merged via generate_base_training_data(), which cyclically tiles sensor locations to match the building footprint count before computing damage classifications.
| Issue | Field(s) Affected | Action Taken |
|---|---|---|
| Empty household DataFrame for bbox | household_df |
Retry without bbox filter |
| Missing / null flood depth readings | flood_depth_m, sensor_flood_depth_m |
Aggregated mean per location; stations with no valid reading excluded |
| Zero-variance numeric columns | All numeric features | Excluded from correlation heatmap via std() > 0 filter |
| Duplicate building entries | Latitude, Longitude |
Spatial deduplication via Fiona streaming with max_rows=1_000_000 cap |
Unavailable House_Height_Meters |
household_df |
Dropped if present; derived from sqrt(House_Surface_Area_Sqm) internally |
Before (raw building footprint):
House_Surface_Area_Sqm = 64.0 # polygon area in m²
After (derived proxy):
House_Height = sqrt(House_Surface_Area_Sqm) # = 8.0 m structural height proxyBefore (raw inputs):
Flood_Depth_Meters = 1.2
Sensor_Flood_Depth_Meters = 1.1
House_Surface_Area_Sqm = 64.0
After (Spark SQL computation):
LEAST(
CAST(Flood_Depth_Meters AS double) / NULLIF(SQRT(House_Surface_Area_Sqm), 0),
1.0
) AS Flood_Height_Ratio, -- = min(1.2/8.0, 1.0) = 0.15
LEAST(
(Flood_Depth_Meters / NULLIF(SQRT(House_Surface_Area_Sqm), 0))
* (Sensor_Flood_Depth_Meters / NULLIF(Flood_Depth_Meters, 0)),
1.0
) AS Flood_House_Damage_Ratio -- = min(0.15 × (1.1/1.2), 1.0) = 0.138The core engineered feature Flood_House_Damage_Ratio is computed as:
Flood_House_Damage_Ratio = min(
(Flood_Depth_Meters / sqrt(House_Surface_Area_Sqm)) × (Sensor_Flood_Depth_Meters / Flood_Depth_Meters),
1.0
)
This ratio encodes both the structural exposure of a building (flood depth relative to estimated house height) and the sensor confirmation weight (how closely the sensor reading corroborates the flood depth estimate). It serves as the primary predictive feature, achieving a Pearson correlation of r = 0.9293 with Target_ECT_Amount.
| Condition | ECT Amount | Interpretation |
|---|---|---|
Flood_House_Damage_Ratio >= 0.8 |
₱10,000 | Total damage — flood depth exceeds 80% of structural height |
0.4 ≤ ratio < 0.8 AND Damage_Classification IN ('Partially Damaged', 'Totally Damaged') |
₱5,000 | Partial damage — sensor-confirmed mid-range exposure |
ratio < 0.4 AND Damage_Classification = 'Totally Damaged' |
₱10,000 | Structural total loss despite low ratio (construction materials) |
ratio < 0.4 AND Damage_Classification = 'Partially Damaged' |
₱5,000 | Partial structural damage at low flood depth |
| All other cases | ₱0 | No qualifying damage |
The contents of the data/ folder are automatically generated by running the ETL scripts:
python etl/scrape_waterlevel.py
python etl/scrape_households.pyect_allocation_model/
├── etl/
│ ├── scrape_waterlevel.py # ETL: DOST-ASTI flood sensor data
│ └── scrape_households.py # ETL: Microsoft Building Footprints
├── notebooks/
│ └── ect_allocation_model.ipynb # Main ML training & evaluation notebook
├── models/
│ └── *.bin # Saved CatBoost model binaries
├── data/
│ ├── waterlevel/ # Cached flood sensor CSV data
│ └── households/ # Cached building footprint GeoJSONL data
├── demo/
│ └── BantayAyudaSystemDemo_*.png
├── .gitignore
└── README.md
The model is powered by two automated ETL (Extract, Transform, Load) modules that scrape real Philippine government and open-source geospatial data.
| Detail | Description |
|---|---|
| Source | DOST-ASTI PhilSensors — real-time water level monitoring stations across the Philippines |
| Method | Scrapes the DOST-ASTI monitoring endpoint via session-authenticated POST request with CSRF token extraction |
| Output | df_hist (raw sensor readings per location/hour) and df_hist_agg (aggregated mean flood_depth_m and sensor_flood_depth_m per location) |
| Columns | Location, Province, Region, sensor_flood_depth_m, flood_depth_m, hourly readings, date |
| Coverage | ~43 flood monitoring stations nationwide |
| Caching | Timestamped CSV files saved to data/waterlevel/ |
| Fallback | Raises ValueError if scraping fails — no synthetic data generation |
Key Functions:
run_dost_etl()— Main ETL pipeline; scrapes sensor data and validates required columnsload_and_aggregate_flood_data()— Loads historical data and aggregates by locationrefresh_realtime_waterlevel()— On-demand cache refresh with latest sensor readings
| Detail | Description |
|---|---|
| Source | Microsoft Global ML Building Footprints — philippines.geojsonl.zip (~5 GB, 17.4M building polygons) |
| Method | Downloads the GeoJSONL archive, extracts it, and streams features using Fiona with spatial bbox filtering for efficient memory usage |
| Output | household_df with Latitude, Longitude, House_Surface_Area_Sqm per building |
| Bbox Filter | Metro Manila region: (120.85, 14.45, 121.15, 14.85) — yields ~922,905 building footprints |
| Area Calculation | Geometries projected to EPSG:3857 for accurate area computation in square meters |
| Caching | Raw GeoJSONL cached to data/households/; re-extracted from zip on first run |
| Fallback | Manual line-by-line GeoJSON parser if Fiona/GeoPandas unavailable |
Key Functions:
load_household_measurements()— Main entry point; supportsbbox,cache_dir, andmax_rowsparameters_load_geojsonl_with_geopandas()— Fiona-based streaming loader with spatial filtering_load_geojsonl_manual()— Fallback line-by-line parser with centroid/area calculationgenerate_base_training_data()— Combines flood locations with building footprints, assignsDamage_Classificationbased on flood-depth-to-height ratio
generate_base_training_data() merges the two ETL outputs:
- Tiles flood sensor locations cyclically to match the number of building footprints
- Derives
House_Heightfromsqrt(House_Surface_Area_Sqm)as a structural proxy - Computes
Damage_Classification(Not Damaged/Partially Damaged/Totally Damaged) using flood-depth-to-height ratio thresholds (0.4 and 0.8) - Produces a training DataFrame with columns:
Location,Latitude,Longitude,Flood_Depth_Meters,House_Surface_Area_Sqm,Sensor_Flood_Depth_Meters,Damage_Classification
Feature engineering (Spark SQL) then adds Flood_Height_Ratio, Flood_House_Damage_Ratio, and the target variable Target_ECT_Amount.
| Technique | Application | Rationale |
|---|---|---|
| Geospatial ETL & Spatial Filtering | Fiona streaming with bbox filter over 17.4M building polygons | Reduces ingestion to the Metro Manila deployment zone (~922,905 buildings) without loading the full ~5 GB dataset into memory |
| Feature Engineering (Spark SQL) | Compute Flood_Height_Ratio and Flood_House_Damage_Ratio via distributed SQL on PySpark |
Translates raw flood depth and building geometry into normalized ratio features that encode structural exposure and sensor confirmation in a single scalar |
| Multi-class Classification (CatBoost GBDT) | Predict ECT tier (₱0 / ₱5,000 / ₱10,000) per household | Gradient-boosted trees with native categorical support handle mixed-type tabular data (location strings, damage classification labels, numeric ratios) without preprocessing |
The three techniques form a sequential analytical pipeline. Geospatial ETL via Fiona streams building footprint geometries into pandas, applies EPSG:3857 area projection, and tiles sensor flood readings across the household corpus. Spark SQL feature engineering computes the ratio features that bridge physical flood measurements and structural geometry into model-ready numeric inputs — the Flood_House_Damage_Ratio achieves the highest Pearson correlation with the target (r = 0.9293), confirming it as the dominant predictive feature. CatBoost then receives the full feature set — including categorical Location and Damage_Classification columns handled natively via ordered target statistics encoding — to classify each household into an ECT tier, achieving near-perfect multi-class classification across 922,905 records via 1,000 boosting iterations of depth-6 trees.
-- Flood Height Ratio
SELECT
*,
LEAST(
CAST(Flood_Depth_Meters AS double) / NULLIF(SQRT(House_Surface_Area_Sqm), 0),
1.0
) AS Flood_Height_Ratio,
LEAST(
(CAST(Flood_Depth_Meters AS double) / NULLIF(SQRT(House_Surface_Area_Sqm), 0))
* (CAST(Sensor_Flood_Depth_Meters AS double) / NULLIF(CAST(Flood_Depth_Meters AS double), 0)),
1.0
) AS Flood_House_Damage_Ratio
FROM df_train-- Target ECT Amount
SELECT
*,
CASE
WHEN Flood_House_Damage_Ratio >= 0.8 THEN 10000
WHEN Flood_House_Damage_Ratio >= 0.4
AND Flood_House_Damage_Ratio < 0.8
AND Damage_Classification IN ('Partially Damaged', 'Totally Damaged') THEN 5000
WHEN Flood_House_Damage_Ratio < 0.4
AND Damage_Classification = 'Totally Damaged' THEN 10000
WHEN Flood_House_Damage_Ratio < 0.4
AND Damage_Classification = 'Partially Damaged' THEN 5000
ELSE 0
END AS Target_ECT_Amount
FROM df_train_with_ratios| ECT Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| ₱0 (No Damage) | 1.00 | 1.00 | 1.00 | 138,176 |
| ₱5,000 (Partial Damage) | 1.00 | 1.00 | 1.00 | 19,279 |
| ₱10,000 (Total Damage) | 1.00 | 1.00 | 1.00 | 27,100 |
- Accuracy: 99.99%
- Weighted F1: 1.00
- Weighted Precision: 1.00
- Weighted Recall: 1.00
| Feature | r with Target_ECT_Amount |
|---|---|
Flood_House_Damage_Ratio |
0.9293 |
Flood_Height_Ratio |
0.4892 |
Sensor_Flood_Depth_Meters |
0.3379 |
Flood_Depth_Meters |
0.2777 |
House_Surface_Area_Sqm |
−0.0751 |
Notable inter-feature correlations:
Flood_Depth_Meters↔Sensor_Flood_Depth_Meters: r = 0.9252Flood_Height_Ratio↔Flood_Depth_Meters: r = 0.8222Flood_House_Damage_Ratio↔Flood_Height_Ratio: r = 0.5961
These correlations validate Flood_House_Damage_Ratio as the primary driver of ECT classification and confirm that the engineered ratio features capture the dominant variance in the target variable.
flowchart TD
A["Data Sources<br/>DOST-ASTI PhilSensors (~43 stations)<br/>Microsoft Global ML Building Footprints (~17.4M polygons)"]
B["ETL Ingestion<br/>scrape_waterlevel.py — session POST + CSRF scraping<br/>scrape_households.py — Fiona streaming + bbox filter"]
C["Spark Session Init<br/>PySpark<br/>local master"]
D["Data Cleaning<br/>Null removal, zero-variance filtering<br/>House_Height proxy from sqrt(Area)<br/>Damage_Classification assignment"]
E["Training Data Generation<br/>generate_base_training_data()<br/>Cyclic sensor tiling to 922,905 footprints"]
F["Feature Engineering (Spark SQL)<br/>Flood_Height_Ratio<br/>Flood_House_Damage_Ratio<br/>Target_ECT_Amount CASE WHEN"]
G["Train/Test Split<br/>test_size=0.2, stratify=y<br/>Train: ~738,400 — Test: ~184,500"]
H["Model Training<br/>CatBoostClassifier<br/>iterations=1000, depth=6<br/>learning_rate=0.1, loss=MultiClass<br/>eval_metric=TotalF1"]
I["Evaluation<br/>Accuracy: 99.99%<br/>Weighted F1: 1.00<br/>Precision: 1.00 — Recall: 1.00"]
J["Visualization<br/>Confusion Matrix (earth-tone colormap)<br/>Correlation Heatmap (YlGn)"]
K["Model Serialization<br/>.bin format<br/>{YYYYMMDD-HHMMSS}ect_allocation_model.bin"]
L["Django REST API<br/>Real-time ECT inference endpoint<br/>Leaflet.js GIS + SMS alert integration"]
A --> B
B --> C
C --> D
D --> E
E --> F
F --> G
G --> H
H --> I
I --> J
J --> K
K --> L
flowchart LR
A["Building Footprint<br/>(House_Surface_Area_Sqm)"]
B["EPSG:3857 Projection<br/>Area computation in m²"]
C["sqrt(Area)<br/>= House_Height proxy"]
D["DOST-ASTI Sensor Reading<br/>(Flood_Depth_Meters, Sensor_Flood_Depth_Meters)"]
E["Spark SQL LEAST / NULLIF"]
F["Flood_Height_Ratio<br/>= min(Depth / Height, 1.0)"]
G["Flood_House_Damage_Ratio<br/>= min(Ratio × Sensor/Flood, 1.0)"]
H{"Damage_Classification<br/>+ Ratio Thresholds"}
I["Target_ECT_Amount<br/>₱0 / ₱5,000 / ₱10,000"]
A --> B --> C --> E
D --> E
E --> F --> G --> H --> I
| Parameter | Value |
|---|---|
iterations |
1000 |
depth |
6 |
learning_rate |
0.1 |
loss_function |
MultiClass |
eval_metric |
TotalF1 |
cat_features |
indices of object-dtype columns (Location, Damage_Classification) |
eval_set |
(X_test, y_test) |
verbose |
500 |
| Metric | Formula | Score |
|---|---|---|
| Accuracy | (TP + TN) / Total | 99.99% |
| Precision (weighted) | Weighted TP / (TP + FP) | 1.00 |
| Recall (weighted) | Weighted TP / (TP + FN) | 1.00 |
| F1 Score (weighted) | 2 × (P × R) / (P + R) | 1.00 |
Files: {YYYYMMDD-HHMMSS}ect_allocation_model.bin
ect_allocation_model_v1.bin
Path: models/
Format: CatBoost native binary (.bin)
Deploy: Django REST API (load_model via CatBoostClassifier.load_model())
| Finding | Detail |
|---|---|
| Near-perfect ECT classification | 99.99% accuracy, 1.00 weighted F1 across three ECT tiers on 922,905 real Metro Manila building footprints |
| Flood_House_Damage_Ratio dominates prediction | r = 0.9293 with Target_ECT_Amount — the highest correlation of any feature, validating the sensor-confirmed ratio engineering approach |
| Sensor–flood depth agreement | r = 0.9252 between Flood_Depth_Meters and Sensor_Flood_Depth_Meters, confirming high DOST-ASTI sensor reliability for model inputs |
| Building area is a weak predictor | r = −0.0751 for House_Surface_Area_Sqm directly, but becomes strong when transformed into the ratio feature — validating geometric transformation over raw area |
| GIS + ML closes the PDNA latency gap | Automated classification enables 72-hour ECT disbursement vs. 2–4 week manual PDNA cycle |
The model achieves near-perfect metrics on the training distribution, which reflects a structurally clean feature space: the Flood_House_Damage_Ratio directly encodes the DSWD's own ECT threshold logic (0.4/0.8), meaning the classifier is correctly learning a policy-grounded decision rule. Real-world deployment performance should be validated against live sensor readings during active typhoon events to confirm that sensor calibration drift and building footprint staleness do not degrade ratio accuracy.
| Feature Type | Feature Description |
|---|---|
| Core | Damage classification (₱0 / ₱5,000 / ₱10,000) via CatBoost GBDT; Automated ECT computation with Flood-to-House Damage Ratio |
| Enabling | Prototype SMS alerts; GIS hazard mapping (Leaflet.js + OpenStreetMap) |
| Enhancing | Django Admin dashboard, real-time visualization, interpretable ML metrics and correlation heatmap |
| Name | Role |
|---|---|
| Gabrielle Ysabel Almirol | Machine Learning Developer, Front-End & Back-End Developer |
| Yashley Joaquin Chua | Researcher, Front-End & Back-End Developer |
| Xander Julius Abo | Researcher, Front-End Developer |
-
Built an automated ETL pipeline web-scraping real-time flood sensor data from DOST-ASTI PhilSensors (~43 stations) and streaming 922,905 Metro Manila building footprints from Microsoft Global ML Building Footprints (17.4M features, ~5 GB GeoJSONL) using Fiona with spatial bbox filtering (
(120.85, 14.45, 121.15, 14.85)) for memory-efficient geospatial ingestion. -
Engineered features via PySpark / Spark SQL — computed
Flood_Height_RatioandFlood_House_Damage_Ratio(sensor-confirmed) from building surface area projections (EPSG:3857) and DOST-ASTI flood depth readings; derivedDamage_Classificationusing ratio thresholds (0.4 / 0.8) and generatedTarget_ECT_Amount(₱0 / ₱5,000 / ₱10,000) via Spark SQLCASE WHENexpressions. -
Constructed a predictive CatBoost GBDT multi-class classifier (iterations=1000, depth=6, lr=0.1, MultiClass loss, TotalF1 eval metric) for ECT allocation; achieved 99.99% accuracy, 1.00 weighted F1-score, 1.00 precision, and 1.00 recall on ~922K building footprints.
-
Produced a confusion matrix (earth-tone colormap,
ConfusionMatrixDisplay) confirming near-perfect classification across all three ECT classes — 138,176 / 19,279 / 27,100 correct predictions for ₱0 / ₱5,000 / ₱10,000 respectively. -
Generated a Seaborn correlation heatmap (YlGn colormap, zero-variance column filtering) to visualize feature-target relationships. Key Pearson correlation coefficients with
Target_ECT_Amount:Flood_House_Damage_Ratio(r = 0.9293),Flood_Height_Ratio(r = 0.4892),Sensor_Flood_Depth_Meters(r = 0.3379),Flood_Depth_Meters(r = 0.2777),House_Surface_Area_Sqm(r = −0.0751). Notable inter-feature correlations:Flood_Depth_Meters↔Sensor_Flood_Depth_Meters(r = 0.9252),Flood_Height_Ratio↔Flood_Depth_Meters(r = 0.8222),Flood_House_Damage_Ratio↔Flood_Height_Ratio(r = 0.5961) — validating the engineered ratio features as primary drivers of ECT classification. -
Loaded the trained
.binmodel into a Django REST API with JSON-based inference endpoints and real-time model loading for automated ECT computation fed from SQLite3 database. -
Implemented Leaflet.js + OpenStreetMap GIS mapping with dynamic JavaScript logic, hazard visualization layers, and optimized Django database interactions.
| Figure | Description |
|---|---|
| Figure 1 | ML Pipeline Flowchart |
| Figure 2 | Flood-to-House Damage Ratio Scoring Diagram |
| Figure 3 | Classification Report — ECT Allocation Performance |
| Figure 4 | Correlation Heatmap (Feature–Target Pearson Coefficients) |
| Figure 5 | BantayAyuda System Demo Screenshots |
[1] Benson, C., & Clay, E. J. (2004). Understanding the economic and financial impacts of natural disasters. World Bank Disaster Risk Management Series.
[2] Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785
[3] Comes, T., & Van de Walle, B. (2014). Measuring disaster resilience: The impact of Hurricane Sandy on critical infrastructure systems. ISCRAM 2014 Conference Proceedings.
[4] Department of Budget and Management (2025). National Calamity Fund Utilization Report. Republic of the Philippines. https://www.dbm.gov.ph
[5] Department of Social Welfare and Development (2025). Emergency Cash Transfer (ECT) Program Guidelines and Operations Manual. Republic of the Philippines. https://www.dswd.gov.ph
[6] Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: Gradient boosting with categorical features support. Workshop on Machine Learning Systems (LearningSys) at NeurIPS 2018. https://arxiv.org/abs/1810.11363
[7] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
[8] Globe Telecom (2025). Disaster response SMS gateway specifications. https://www.globe.com.ph/disaster-response
[9] Goodchild, M. F. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal, 69(4), 211–221. https://doi.org/10.1007/s10708-007-9111-y
[10] Jannach, D., Zanker, M., Felfernig, A., & Friedrich, G. (2010). Recommender systems: An introduction. Cambridge University Press.
[11] McAfee, A., & Brynjolfsson, E. (2012). Big data: The management revolution. Harvard Business Review, 90(10), 60–68.
[12] Meier, P. (2015). Digital humanitarians: How big data is changing the face of humanitarian response. CRC Press.
[13] Microsoft. (2023). Global ML Building Footprints. GitHub. https://github.com/microsoft/GlobalMLBuildingFootprints
[14] National Disaster Risk Reduction and Management Council (2025). SitRep No. 5: Typhoon Uwan (Fung-wong). https://ndrrmc.gov.ph
[15] National Disaster Risk Reduction and Management Council (2025). SitRep No. 3: Typhoon Tino (Kalmaegi). https://ndrrmc.gov.ph
[16] Office of Civil Defense (2025). PDNA Preliminary Report: Typhoon Uwan. https://ocd.gov.ph
[17] Philippine Statistics Authority (2025). NICTHS 2024 Survey Findings. https://psa.gov.ph
[18] Provost, F., & Fawcett, T. (2013). Data science for business. O'Reilly Media.
[19] Republic Act No. 10639. Free Mobile Disaster Alerts Act. https://www.officialgazette.gov.ph/2014/06/20/republic-act-no-10639/
[20] Russell, S., & Norvig, P. (2020). Artificial intelligence: A modern approach (4th ed.). Pearson.
[21] Smart Communications (2025). Emergency alert system report. (Internal report — no public link)
[22] Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(Supplement), 234–240. https://doi.org/10.2307/143141
[23] UNDRR (2015). Sendai Framework for Disaster Risk Reduction 2015–2030. United Nations. https://www.undrr.org/publication/sendai-framework-disaster-risk-reduction-2015-2030
[24] United Nations Global Pulse. (2013). Big data for development: A primer. United Nations. https://www.unglobalpulse.org
[25] World Health Organization Philippines (2025). SPEED System Performance Report. https://www.who.int/philippines
[26] Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. USENIX HotCloud Workshop.

