datascout

This repo queries and gather data from different resources which can then be used to run bulk automatised annotation of microbial genomes. The different types of data are:

OrthoDB and UniProt orthology data;
Rfam (non coding Rna data);
Transcriptomic data from ENA

Getting this repo

git clone [email protected]:manuelcarbajo/datascout

Configuration

Refresing environment

This project uses nextflow-24.04.3

Initialising and running the environment

After downloading the datascout repo define a PROJECT_DIR variable (path to datascout git repo).

export PROJECTDIR="/path/to/your/Ensembl-datascout"
cd ${PROJECTDIR}

Define the configuration of USER/PWD/SERVER/PORT of your mysql ncbi_tax and rfam DBs in ""${PROJECT_DIR}/conf/ncbi_db.conf"" and ""${PROJECT_DIR}/conf/rfam_db.conf"" following the structure in the template in that same folder: mysql://USER:PWD@mysql-ncbi-SERVER:PORT/ncbi_taxonomy_db.

Test the configuration with:

nextflow run main.nf -profile slurm,test,singularity

Use a comma separated list of genomes to annotate as input
(following the template in ""${PROJECT_DIR}/assets/test_data/genomes_test_list.csv"")

GENOME_NAME TAX_ID ENA_ACCESSION
#Example:
toxoplasma_gondii_ME49,508771,GCA_000006565.2
tripanosoma_cruzi,5693,GCA_003719455.1

Define and export the following variables:

export INPUT_CSV="/path/to/your/input-file-dir/your_genomes_list.csv" export OUTPUT_PATH="/path/to/your/output-dir/genome_annotations"

export ORTHODB_FOLDER="/path/to/your/static-storage-dir/orthodb_dir" export ASSEMBLIES_DIR="/path/to/your/static-storage-dir/assemblies_dir" export ENA_CSV_DIR="/path/to/your/static-storage-dir/ena_csv_dir" export FASTQ_DIR="/path/to/your/static-storage-dir/rna_fastq_dir" export UNIPROT_DIR="/path/to/your/static-storage-dir/uniprot_dir"

To run the pipeline execute:

nextflow run main.nf --csv_file $INPUT_CSV --outdir $OUTPUT_PATH --orthodb_dir $ORTHODB_FOLDER --assemblies_dir $ASSEMBLIES_DIR --rna_fastq_dir $FASTQ_DIR --uniprot_dir $UNIPROT_DIR  --ena_csv_dir $ENA_CSV_DIR -profile slurm

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.devcontainer		.devcontainer
.github		.github
assets		assets
bin		bin
conf		conf
docs		docs
modules		modules
subworkflows		subworkflows
workflows		workflows
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.nf-core.yml		.nf-core.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.prettierrc.yml		.prettierrc.yml
CHANGELOG.md		CHANGELOG.md
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
README.md		README.md
env_datascout.yml		env_datascout.yml
env_setup.sh		env_setup.sh
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
tower.yml		tower.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

datascout

Getting this repo

Configuration

Refresing environment

Initialising and running the environment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Ensembl/datascout

Folders and files

Latest commit

History

Repository files navigation

datascout

Getting this repo

Configuration

Refresing environment

Initialising and running the environment

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages