Skip to content

Data Flow

Krish Shah edited this page Mar 28, 2026 · 44 revisions

Overview

Data Flow Diagram You can regenerate this diagram by pasting the linked code into Excalidraw.

Note: Production is the only environment with the serializer.

Files

Building.json

  • buildings.json: The main ESIM building dataset. It is produced by osm_building_to_json.py, which combines the existing building metadata with OpenStreetMap geometry, and then updated by add_fms_id.py to add FMS IDs.

    Intermediary Files

    • query.json: A raw snapshot of CMU building data from the public ArcGIS campus layer. It is generated by arc_gis_query.py and serves as the source for official building names, abbreviations, and IDs.
    • export.osm: A local OpenStreetMap export for the CMU campus area. It is generated by fetch_osm_data.py and used by osm_building_to_json.py to extract building geometry.
    • sign_abbrev_mapping.json: A small lookup file that maps building abbreviations to FMS building IDs. It is generated by sign_abbrev_mapping.py from query.json.
    • building_info_map.json: A simplified lookup keyed by building code, mainly for basic building info like name and default floor. It is generated by generate_building_info_map.py from buildings.json.

Osm-outside.json

  • osm-outside.json: A file containing all nodes outside of buildings, their neighbors, and positions by their OSM IDs. Note that this does not connect to any nodes inside of buildings.

Sources

CMU ArcGIS

  • Provides the official building metadata used for names, abbreviations, and building identifiers.

OpenStreetMap

  • Provides the building geometry that is used to rebuild the final ESIM buildings.json.
  • Provides the nodes and connections for roads, sidewalks, and any other passages outside of buildings in the osm-outside.json

Steps

Step 1: Scraping

Scraper code

S3 bucket link: https://minio.scottylabs.org/browser/cmumaps

Data sources:

Step 2: Generation

The generator takes in the scraped data and the serialized data as input and generates

  • Rescale the svgs to fit in a 1920x1080 rectangle and converts them to floorplans.json file.

  • Generates the inside graph from the floorplans.json file.

Step 3: Deserialization

The S3 bucket json files are deserialized locally.

Step 4: Visualization

The visualizer is used to place the data in geo-coordinates. It can also be used to add new data, such as connections between rooms and POIs.

Step 5: Serialization

The updated data are serialized to the S3 bucket.

Step 6: Deployment

The S3 bucket files are deserialized in staging and production.

Clone this wiki locally