Skip to content

Commit

Permalink
addressed comments from @CitrusDanWang and @BuggieCoder
Browse files Browse the repository at this point in the history
  • Loading branch information
mbobra committed Sep 20, 2023
1 parent ba0bbab commit c21c809
Show file tree
Hide file tree
Showing 6 changed files with 53,307 additions and 51,340 deletions.
2,867 changes: 2,867 additions & 0 deletions Forecast_2023.csv

Large diffs are not rendered by default.

2,867 changes: 0 additions & 2,867 deletions Predictions_2023.csv

This file was deleted.

19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,20 @@
# aae-dsa-water
## Advanced Analytics and Evaluation Data Science Accelerator - Water Project

The [Division of Drinking Water](https://www.waterboards.ca.gov/drinking_water/programs/) (DDW) at the [California State Water Resources Control Board](https://www.waterboards.ca.gov/) regulates 2800 Community Water Systems (CWS) throughout the state. Some of these CWS risk running out of water during the dry summer season. To solve this problem, we created a machine learning model that predicts which CWS face the highest risk of running out of water. The model is intended to run in production on a monthly basis, producing predictions for at-risk CWS within the subsequent ninety days.
The [Division of Drinking Water](https://www.waterboards.ca.gov/drinking_water/programs/) (DDW) at the [California State Water Resources Control Board](https://www.waterboards.ca.gov/) regulates 2866 Community Water Systems (CWS) throughout the state. Some of these CWS risk running out of water during the dry summer season. To address this problem, we created a machine learning model that forecasts which CWS face the highest risk of running out of water. The model is intended to run in production on a monthly basis, producing forecasts for at-risk CWS within the subsequent ninety days.

## Contents
* `water.ipynb`: A Jupyter notebook with code to generate the machine learning model.
* `environment.yml`: To use `water.ipynb` notebooks together with all the requisite Python packages, create a new conda environment called `water` that uses this environment file by running this command: `conda env create -f environment.yml`. This environment file will install OSX-specific packages, and will reproduce the original environment exactly.
* `environment_cross_compatible.yml`: To create an enviroment file on a Linux or Windows machine, use this cross-compatible file. This will not reproduce the environment exactly.
This repository includes data, code, and environment files.
* **Data** - The repository /data includes all the data used for this model.
* **Code** - The Jupyter notebook titled `water.ipynb` includes all code to generate the machine learning model.
* **Environment Files** - The environment files include all the Python packages necessary to run the code. To reproduce the original environment exactly, use the OSX-specific environment file called `environment.yml`. To use a platform-agnostic environment file, use `environment_cross_compatible.yml`.
* **Forecast** - The file `Forecast_2023.csv` includes all the CWS forecasted to experience some form of drought impact.

## Running the Model
The following steps describe exactly how to use the data, code, and envrionment file together to build and run the model. This will generate a .csv file of drought-impacted CWSs ranked by probability (see the column `Expected`).

1. First, install conda. Conda is a tool that creates environments, or a collection of version-specific and OS-specific Python packages. This allows us to reproduce our work exactly. Imagine you use a different verson of a Python package than someone else, or on a different environment -- this might produce differing results from the original developer. Therefore, it is best practice to use the same environment as the developer. Follow [these installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html).
2. Create the environment using the command `conda env create -f environment.yml`. This environment file will install OSX-specific packages, and will reproduce the original environment exactly. To create an enviroment file on a Linux or Windows machine, use the platform-agnostic envrionment file called `environment_cross_compatible.yml`.
3. Open Jupyter Lab by typing `jupyter-lab` in the terminal.
4. Navigate to the list of files and open `water.ipynb`.
5. Under the tab Run, select 'Run all cells'. This will produce a .csv file called `Forecast_2023.csv`.
Loading

0 comments on commit c21c809

Please sign in to comment.