The purpose of this SDK is to publish code that will help data scientists to query MongoDB using python so as to bulk download data and files directly from the SEARS backend for aggregated analysis. Case studies 6.1 and 6.2 from our main paper were conducted using this SDK.
Please refer to our main SEARS platform repository here.
- Copy the
.env
file to the root directory of the project. Update the connection string to use your own MongoDB Atlas connection string. Also update the AWS S3 parameters as per your AWS settings. - Install all requirements using
pip3 install -r requirements.txt
- Run
python3 mongo_connect.py
to download data from MongoDB to a CSV file. Setsearch_criteria
andoutput_file_name
in the program file. - Run
python3 AWS_Download.py
to download files from AWS S3 to a local directory./file_fetch/
. All files related to experiments meeting the search criteria will be downloaded. - Run your ML model on the downloaded data and files.
#Steps
- Notice the folder
./uploads
in the root directory of the project. This folder is used to upload data to MongoDB. - Drop data for an experiment in the folder
./uploads
. The data should be in the form of a JSON file. - Run the program
python3 auto_upload.py
to upload the data to MongoDB. The program will automatically upload the data to the MongoDB collectionproductData
.