Team2_AI_Microbiome_Poultry

A collaborative BioAI Hackathon project applying machine learning to poultry microbiome data for predictive health and production trait modeling.

📌 Team

Mxolisi Nene (Team Lead)
Karthikeyan Govindan
Rajarshi Mondal

🎯 Goal

Use publicly available poultry microbiome datasets (16S rRNA / Metagenomics) to identify microbial biomarkers associated with growth, gut health, and disease resistance using AI techniques.

🛠️ Methodology

Data preprocessing & normalization
Microbiome taxonomic profiling using:
- Shotgun metagenomics: Kraken2
- 16S rRNA amplicon sequencing: DADA2
ML techniques:
- Clustering
- Classification
- Feature selection
Results interpretation & visualization

📦 Deliverables

AI-informed microbiome health prediction model
Presentation & report for BioAI Hackathon 2025

📄 License

MIT

📊 Workflow Diagram

flowchart TD
  A[Download SRA] --> B[QC & Trim]
  B --> C{Shotgun?}
  C -->|Yes| D[Kraken2 Profiling]
  C -->|No| E[DADA2 16S Processing]
  D --> F[Generate Taxa Table]
  E --> G[Generate Feature Table]
  F --> H[Model Training]
  G --> H
  H --> I[Report]

📥 Data Retrieval and Preprocessing

📜 Script: `data_retrieval.sh`

This script automates the process of:

Downloading sequencing data from ENA using a BioProject ID
Performing adapter trimming using fastp
Generating a MultiQC report for quality control

▶️ Usage

To run the script, use the following command:

./data_retrieval.sh [BioProject_ID]

🔍 Example

./data_retrieval.sh PRJNA707106

⚙️ Data Processing Pipeline Using DADA2 in R

📜 Script: dada2_script.R

This script automates the essential preprocessing steps for amplicon sequencing data, including:

✂️ Quality filtering and primer trimming to clean raw reads

🔗 Merging paired-end reads for accurate sequence reconstruction

🚫 Removing chimeras to ensure data integrity

🎯 Generating the final Amplicon Sequence Variant (ASV) table ready for taxonomic classification

🧬 Taxonomic Annotation

Taxonomy is assigned using the SILVA reference database:
silva_nr99_v138.1_wSpecies_train_set.fa.gz

Outputs include taxonomic tables at the Genus and Species levels

Results are saved as CSV files, paired with metadata for easy integration

🚀 Ready for Downstream Analysis

The processed data is perfectly formatted to jump straight into:

Machine learning workflows

Visualization and reporting

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
figures		figures
objects		objects
scripts		scripts
workflow		workflow
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team2_AI_Microbiome_Poultry

📌 Team

🎯 Goal

🛠️ Methodology

📦 Deliverables

📄 License

📊 Workflow Diagram

📥 Data Retrieval and Preprocessing

📜 Script: `data_retrieval.sh`

▶️ Usage

🔍 Example

⚙️ Data Processing Pipeline Using DADA2 in R

📜 Script: dada2_script.R

🧬 Taxonomic Annotation

🚀 Ready for Downstream Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Team2_AI_Microbiome_Poultry

📌 Team

🎯 Goal

🛠️ Methodology

📦 Deliverables

📄 License

📊 Workflow Diagram

📥 Data Retrieval and Preprocessing

📜 Script: data_retrieval.sh

▶️ Usage

🔍 Example

⚙️ Data Processing Pipeline Using DADA2 in R

📜 Script: dada2_script.R

🧬 Taxonomic Annotation

🚀 Ready for Downstream Analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

📜 Script: `data_retrieval.sh`

Packages