This repository hosts ApisTox dataset, for applications of data analysis and ML in ecotoxicology and agrochemistry.
Paper is freely available (open access) on Scientific Data, and preprint is available on ArXiv.
Dataset and code are released under CC-BY-NC-4.0 license.
Final dataset file is outputs/dataset_final.csv. For dataset splits, see
outputs/splits directory.
Raw input data is in raw_data directory. Other datasets from this area are
in other_sources directory (we do not recommend using them).
Setup virtual environment:
- Poetry (recommended), run make installorpoetry install --no-root
- venv, run pip install requirements.txt
Scripts:
- recreate dataset: python create_dataset.py
- split dataset:python create_dataset_splits.py
- create analyses and plots: python analyze_dataset.py