A collaborative BioAI Hackathon project applying machine learning to poultry microbiome data for predictive health and production trait modeling.
- Mxolisi Nene (Team Lead)
- Karthikeyan Govindan
- Rajarshi Mondal
Use publicly available poultry microbiome datasets (16S rRNA / Metagenomics) to identify microbial biomarkers associated with growth, gut health, and disease resistance using AI techniques.
- Data preprocessing & normalization
- Microbiome taxonomic profiling using:
- Shotgun metagenomics: Kraken2
- 16S rRNA amplicon sequencing: DADA2
- ML techniques:
- Clustering
- Classification
- Feature selection
- Results interpretation & visualization
- AI-informed microbiome health prediction model
- Presentation & report for BioAI Hackathon 2025
MIT
flowchart TD
A[Download SRA] --> B[QC & Trim]
B --> C{Shotgun?}
C -->|Yes| D[Kraken2 Profiling]
C -->|No| E[DADA2 16S Processing]
D --> F[Generate Taxa Table]
E --> G[Generate Feature Table]
F --> H[Model Training]
G --> H
H --> I[Report]
This script automates the process of:
- Downloading sequencing data from ENA using a BioProject ID
- Performing adapter trimming using
fastp - Generating a MultiQC report for quality control
To run the script, use the following command:
./data_retrieval.sh [BioProject_ID]./data_retrieval.sh PRJNA707106This script automates the essential preprocessing steps for amplicon sequencing data, including:
✂️ Quality filtering and primer trimming to clean raw reads
🔗 Merging paired-end reads for accurate sequence reconstruction
🚫 Removing chimeras to ensure data integrity
🎯 Generating the final Amplicon Sequence Variant (ASV) table ready for taxonomic classification
Taxonomy is assigned using the SILVA reference database:
silva_nr99_v138.1_wSpecies_train_set.fa.gz
Outputs include taxonomic tables at the Genus and Species levels
Results are saved as CSV files, paired with metadata for easy integration
The processed data is perfectly formatted to jump straight into:
Machine learning workflows
Visualization and reporting