Various tools for QuantAQ sensor data analysis. See notebooks/ for the older version of this repo which contains jupyter notebooks for visual data analysis.
This pipeline automates the data pulling and cleaning process for grabbing data from the QuantAQ network of sensors.
- These instructions assume you are running at least Python 3.8
- Run
pip install -r requirements.txtto install the Python dependencies. - Get an API key for QuantAQ. You can ask Scott Hersey for his API key (there have been issues in the past if you are not an admin for the sensor network you're trying to pull data from). Copy the key into a file called
token.txtin the root of this repository.
Check out the demo.py script for example usage!
Note that requesting data from QuantAQ is slow! It takes on the order of 2-3 minutes per sensor, per day
By default, cleaned data results are returned as a pandas DataFrame and also stored as a pickle file.
Whenever you run the pipeline (either calling from_csv() or from_api() methods), the cleaned dataframe is returned by the method, and it is also saved locally as a pickle file. The naming scheme for these pickle files is as follows:
#for dataframes that were not smoothed.
qaq_cleaned_data/<sensor_id>/<start_year>_<start_month>_<start_day>_<end_year>_<end_month>_<end_day>.pckl
#for dataframes that were smoothed (unrealistically high sensor readings have been removed from the dataframe)
qaq_cleaned_data/<sensor_id>/<start_year>_<start_month>_<start_day>_<end_year>_<end_month>_<end_day>_smoothed.pckl
If saving a dataframe from the from_api() call, the start and end Y_M_D dates are determined by the datetime objects that were passed to the QuantAQ API call. A dataframe saved during the from_csv() method will have dates determined by the first and last timestamps that appear in the dataframe. Therefore, a smoothed dataframe with min/max timestamps of March 1st, 2021 to March 10th, 2021, originating from the sensor SN000-046 would have the filepath:
qaq_cleaned_data/SN000-046/2021_3_1_2021_3_10_smoothed.pckl
If you want to visualize dataframe results with the R package, OpenAir, you will need to install the necessary R dependencies. To be specific, any methods found in the dataviz.py file require R. This involves installing R, installing OpenAir (and its dependencies).
- R installation: link
- OpenAir installation: Open the R application. In the taskbar, select
Packages & Datadropdown menu, and click onPackage Installer. This should open up a window. From there, make sure thatCRAN binariesis selected from the first dropdown menu. Type "openair" into the search bar. It might prompt you to select a region. I pickedUS [IW], then pressedGet List. I selected the first result. At the bottom, make sure thatInstall Dependenciesis checked, then clickInstall Selected. NOTE: I did these steps in a much more jank way, so I'd be happy to know if someone has success replicating this process. - Run
export R_HOME=/Library/Frameworks/R.framework/Resources - Run
pip install rpy2to install therpy2package.
- Install gdebi:
sudo apt update
sudo apt -y install r-base gdebi-core
-
Get the most recent RStudio version - I selected the
.debfile for Ubuntu 18 - from here. -
Run
sudo gdebi rstudio-1.2.5019-amd64.deb -
add
deb http://security.ubuntu.com/ubuntu xenial-security mainto/etc/apt/sources.list -
sudo apt update`sudo apt install libssl1.0.0,1 -
Deleted the line I added to sources.list for good measure, probably not necessary/will mess things up next time I update but that's a choice I made
-
Open the RStudio application
-
Download OpenAir package and dependencies from RStudio: Tools > Install Packages > Type "openair" into the search bar > make sure "install dependencies:" is checked > Install
-
Run
export R_HOME=/usr/lib/R -
Run
pip install rpy2to install therpy2package.
NOTE: I haven't been able to test these instructions myself, so definitely feel free to update these instructions if you encounter any difficulty
- Install Windows RStudio:
- Get the most recent RStudio version from here.
- Run the installer
- Open RStudio application
- Download OpenAir package and dependencies from RStudio: Tools > Install Packages > Type "openair" into the search bar > make sure "install dependencies:" is checked > Install
- You might need to export your
R_HOMEto run the Python code - Run
pip install rpy2to install therpy2package
Refer to the demo_main.py script for example usage!