Skip to content

baskargroup/SEARS-Data-Pull

Repository files navigation

SEARS SDK

The purpose of this SDK is to publish code that will help data scientists to query MongoDB using python so as to bulk download data and files directly from the SEARS backend for aggregated analysis. Case studies 6.1 and 6.2 from our main paper were conducted using this SDK.

Main SEARS platform

Please refer to our main SEARS platform repository here.

Steps to pull data.

  1. Copy the .env file to the root directory of the project. Update the connection string to use your own MongoDB Atlas connection string. Also update the AWS S3 parameters as per your AWS settings.
  2. Install all requirements using pip3 install -r requirements.txt
  3. Run python3 mongo_connect.py to download data from MongoDB to a CSV file. Set search_criteria and output_file_name in the program file.
  4. Run python3 AWS_Download.py to download files from AWS S3 to a local directory ./file_fetch/. All files related to experiments meeting the search criteria will be downloaded.
  5. Run your ML model on the downloaded data and files.

Process to automate the upload of experiment data to MongoDB

#Steps

  1. Notice the folder ./uploads in the root directory of the project. This folder is used to upload data to MongoDB.
  2. Drop data for an experiment in the folder ./uploads. The data should be in the form of a JSON file.
  3. Run the program python3 auto_upload.py to upload the data to MongoDB. The program will automatically upload the data to the MongoDB collection productData.

About

This repo contains a SDK to access data directly from SEARS backend.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published