Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
6353679
Add run_dbcan screening
HaidYi Jul 1, 2025
15f2ef5
fix missing gffs
HaidYi Jul 1, 2025
d5df4a1
split dbcan results by meta.id
HaidYi Jul 1, 2025
f049e2f
rm constraints of annotation tool
HaidYi Jul 1, 2025
8289bdb
add test config for rundbcan
HaidYi Jul 1, 2025
d8af5e9
add test profile for rundbcan in ci
HaidYi Jul 1, 2025
0a5e505
add dbcan in the refs
HaidYi Jul 2, 2025
01a573a
Suggestions from code review
Jul 10, 2025
5c5ec66
rm duplicate outputs
HaidYi Jul 15, 2025
9fd005c
add manual dbCAN database download
HaidYi Jul 15, 2025
ea4b852
rename DBCAN to CAZYME
HaidYi Jul 15, 2025
62623a5
add gff column in samplesheet
HaidYi Jul 15, 2025
0cad8f9
change run_dbcan_screening to run_cazyme_screening
HaidYi Jul 15, 2025
b76e3a2
add missing identifier
HaidYi Jul 17, 2025
0f5863a
add missing identifier
HaidYi Jul 17, 2025
f2d79d5
add missing conda
HaidYi Jul 17, 2025
625ced4
fix typo
HaidYi Jul 17, 2025
58273f1
re-organize the outdir structure of cazyme screening
HaidYi Jul 17, 2025
a638f32
add citation
HaidYi Jul 26, 2025
a5d692b
add cazyme_skip_dbcan param
HaidYi Jul 26, 2025
da9d4a4
fix missing ','
HaidYi Jul 26, 2025
9f3af6c
add gff type parameter for dbcan
HaidYi Aug 25, 2025
15645fb
mv hard-coded gff type to params
HaidYi Aug 25, 2025
448145b
Merge remote-tracking branch 'origin/dev' into rundbcan
jfy133 Aug 27, 2025
8e48936
fix typo
HaidYi Aug 27, 2025
101a159
fix format
HaidYi Aug 27, 2025
9c14e24
fix lint issue
HaidYi Aug 27, 2025
0899e10
Merge branch 'dev' into rundbcan
HaidYi Aug 27, 2025
63c8b04
Fix snapshot
jfy133 Aug 28, 2025
03b1030
Fix RO crate
jfy133 Aug 28, 2025
cce04b2
only list top view
HaidYi Sep 18, 2025
ddd51c1
Update docs/output.md
HaidYi Sep 18, 2025
b31feb6
Update docs/output.md
HaidYi Sep 18, 2025
36c22d3
Update docs/output.md
HaidYi Sep 18, 2025
59385f9
Update docs/output.md
HaidYi Sep 18, 2025
2a6544e
Update docs/output.md
HaidYi Sep 18, 2025
2dbe952
Update docs/output.md
HaidYi Sep 18, 2025
01fb374
add a column: gff_type in samplesheet
HaidYi Sep 23, 2025
13b82ab
rm dbcan_gff_type parameter
HaidYi Sep 23, 2025
3af937f
add option for using local dbcan db
HaidYi Sep 23, 2025
c28f049
filter samples for dbcan cgc/substrate if no gff_type provided in sam…
HaidYi Sep 23, 2025
796b96d
add cazyme to toolCitationText
HaidYi Sep 25, 2025
6de2005
Update docs/output.md
HaidYi Sep 25, 2025
5f9b432
update the profile name
HaidYi Sep 27, 2025
e114734
add cazyme_screening to default test
HaidYi Sep 28, 2025
6ee4dd7
add test_cazyme_pyrodigal test
HaidYi Sep 28, 2025
18ba885
add cazyme_dbcan_db to params
HaidYi Sep 28, 2025
161d37d
fix bug
HaidYi Sep 28, 2025
d505ea6
add gff_type in meta for cazyme screening
HaidYi Sep 28, 2025
69d5133
Merge branch 'dev' into rundbcan
jfy133 Oct 8, 2025
f5ed73e
[automated] Fix code linting
nf-core-bot Oct 8, 2025
c3c09c9
Merge branch 'dev' into rundbcan
jfy133 Nov 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,10 @@

> Alcock, B. P., Huynh, W., Chalil, R., Smith, K. W., Raphenya, A. R., Wlodarski, M. A., Edalatmand, A., Petkau, A., Syed, S. A., Tsang, K. K., Baker, S. J. C., Dave, M., McCarthy, M. C., Mukiri, K. M., Nasir, J. A., Golbon, B., Imtiaz, H., Jiang, X., Kaur, K., Kwong, M., Liang, Z. C., Niu, K. C., Shan, P., Yang, J. Y. J., Gray, K. L., Hoad, G. R., Jia, B., Bhando, T., Carfrae, L. A., Farha, M. A., French, S., Gordzevich, R., Rachwalski, K., Tu, M. M., Bordeleau, E., Dooley, D., Griffiths, E., Zubyk, H. L., Brown, E. D., Maguire, F., Beiko, R. G., Hsiao, W. W. L., Brinkman F. S. L., Van Domselaar, G., McArthur, A. G. (2023). CARD 2023: expanded curation, support for machine learning, and resistome prediction at the Comprehensive Antibiotic Resistance Database. Nucleic acids research, 51(D1):D690-D699. [DOI: 10.1093/nar/gkac920](https://doi.org/10.1093/nar/gkac920)

- [dbCAN](https://doi.org/10.1093/nar/gkad328)

> Jinfang Zheng, Qiwei Ge, Yuchen Yan, Xinpeng Zhang, Le Huang, Yanbin Yin, dbCAN3: automated carbohydrate-active enzyme and substrate annotation, Nucleic Acids Research, Volume 51, Issue W1, 5 July 2023, Pages W115–W121. [DOI:10.1093/nar/gkad328](https://doi.org/10.1093/nar/gkad328)

- [SeqKit](https://bioinf.shenwei.me/seqkit/)

> Shen, W., Sipos, B., & Zhao, L. (2024). SeqKit2: A Swiss army knife for sequence and alignment processing. iMeta, e191. [https://doi.org/10.1002/imt2.191](https://doi.org/10.1002/imt2.191)
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s
5. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify)
6. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg). [`argNorm`](https://github.com/BigDataBiology/argNorm) is used to map the outputs of `DeepARG`, `AMRFinderPlus`, and `ABRicate` to the [`Antibiotic Resistance Ontology`](https://www.ebi.ac.uk/ols4/ontologies/aro) for consistent ARG classification terms.
7. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/)
8. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
9. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)
8. Screening contigs for carbohydrate-active enzymes (CAZymes), CAZyme gene clusters and substrates with [run_dbcan](https://github.com/bcb-unl/run_dbcan).
9. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/Darcy220606/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs
10. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/)

![funcscan metro workflow](docs/images/funcscan_metro_workflow.png)

Expand Down
2 changes: 1 addition & 1 deletion assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fasta,protein,gbk
sample,fasta,protein,gbk,gff
sample_1,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_1.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.faa,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_1.gbk
sample_2,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_2.fasta.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.faa.gz,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs_prokka_2.gbk.gz
sample_3,https://raw.githubusercontent.com/nf-core/test-datasets/funcscan/wastewater_metagenome_contigs.fasta
16 changes: 15 additions & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,26 @@
"exists": true,
"pattern": "^\\S+\\.(gbk|gbff)(\\.gz)?$",
"errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gbk`, `.gbk.gz` or `.gbff`, or `.gbff.gz`"
},
"gff": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.(gff|gff3)(\\.gz)?$",
"errorMessage": "Input file for feature annotations has incorrect file format. File must end in `.gff`, `.gff.gz` or `.gff3`, or `.gff3.gz`"
},
"gff_type": {
"type": "string",
"enum": ["NCBI_prok", "prodigal", "NCBI_euk", "JGI"],
"errorMessage": "GFF type must be one of: NCBI_prok, prodigal, NCBI_euk, or JGI",
"meta": ["gff_type"]
}
},
"required": ["sample", "fasta"],
"dependentRequired": {
"protein": ["gbk"],
"gbk": ["protein"]
"gbk": ["protein"],
"gff": ["protein"]
}
},
"uniqueItems": true
Expand Down
35 changes: 35 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -740,4 +740,39 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_DATABASE {
publishDir = [
path: { "${params.outdir}/databases/dbcan/" },
mode: params.publish_dir_mode,
enabled: params.save_db,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_CAZYMEANNOTATION {
publishDir = [
path: { "${params.outdir}/cazyme/dbcan/cazyme_annotation/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_EASYCGC {
publishDir = [
path: { "${params.outdir}/cazyme/dbcan/cgc/${meta.id}" },
mode: params.publish_dir_mode,
pattern: "*_{cgc.gff,cgc_standard_out.tsv,diamond.out.tc,TF_hmm_results.tsv,STP_hmm_results.tsv}",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: RUNDBCAN_EASYSUBSTRATE {
publishDir = [
path: { "${params.outdir}/cazyme/dbcan/substrate/${meta.id}" },
mode: params.publish_dir_mode,
pattern: "*_{total_cgc_info.tsv,substrate_prediction.tsv,synteny_pdf}",
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}
}
2 changes: 2 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,6 @@ params {
run_amp_screening = true
amp_run_hmmsearch = true
amp_hmmsearch_models = params.pipelines_testdata_base_path + 'funcscan/hmms/mybacteriocin.hmm'

run_cazyme_screening = true
}
34 changes: 34 additions & 0 deletions conf/test_cazyme_pyrodigal.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should be for all cazyme screening tools, so I would rename accordingly for 'future proofing'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea for leaving a placeholder for other cazyme screening developers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow...

Basicaally what I mean this should be: test_cazyme_pyrodigal not test_dbcan_pyrodigal!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/funcscan -profile test_dbcan_pyrodigal,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}

params {
config_profile_name = 'CAZyme Pyrodigal test profile'
config_profile_description = 'Minimal test dataset to check CAZyme workflow function'

// Input data
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_reduced.csv'

annotation_tool = 'pyrodigal'

run_arg_screening = false
run_amp_screening = false
run_bgc_screening = false
run_cazyme_screening = true
}
37 changes: 37 additions & 0 deletions conf/test_preannotated_dbcan.config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be test_preannotated_cazyme

Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/funcscan -profile test_preannotated_dbcan,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

process {
resourceLimits = [
cpus: 4,
memory: '15.GB',
time: '1.h'
]
}

params {
config_profile_name = 'CAZyme test profile - preannotated input'
config_profile_description = 'Minimal test dataset to check CAZyme workflow function'

// Input data
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_preannotated.csv'

annotation_tool = 'pyrodigal'

run_arg_screening = false
run_amp_screening = false
run_bgc_screening = false
run_cazyme_screening = true

dbcan_skip_cgc = true // Skip cgc annotation as .gbk (not .gff) is provided in samplesheet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add gff files!

You can generate them from a normal funcscan fun, and make a PR against teh funscan branch of nf-core/testdatasets, which has the files and an updated samplesheet for the next funcscan version

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently the cazyme screening can only use the .gff files in the pipeline. To use the pre-annotated one, I generated the .gff files from pyrodigal. The PR can be found at nf-core/test-datasets#1683.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be updated now you have the file?

dbcan_skip_substrate = true // Skip substrate annotation as .gbk (not .gff) is provided in samplesheet
}
56 changes: 46 additions & 10 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@ The output of nf-core/funcscan provides reports for each of the functional group
- **antibiotic resistance genes** (tools: [ABRicate](https://github.com/tseemann/abricate), [AMRFinderPlus](https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder), [DeepARG](https://bitbucket.org/gusphdproj/deeparg-ss/src/master), [fARGene](https://github.com/fannyhb/fargene), [RGI](https://card.mcmaster.ca/analyze/rgi) – summarised by [hAMRonization](https://github.com/pha4ge/hAMRonization). Results from ABRicate, AMRFinderPlus, and DeepARG are normalised to [ARO](https://obofoundry.org/ontology/aro.html) by [argNorm](https://github.com/BigDataBiology/argNorm).)
- **antimicrobial peptides** (tools: [Macrel](https://github.com/BigDataBiology/macrel), [AMPlify](https://github.com/bcgsc/AMPlify), [ampir](https://ampir.marine-omics.net), [hmmsearch](http://hmmer.org) – summarised by [AMPcombi](https://github.com/Darcy220606/AMPcombi))
- **biosynthetic gene clusters** (tools: [antiSMASH](https://docs.antismash.secondarymetabolites.org), [DeepBGC](https://github.com/Merck/deepbgc), [GECCO](https://gecco.embl.de), [hmmsearch](http://hmmer.org) – summarised by [comBGC](#combgc))
- **carbohydrate-active enzymes (CAZymes)**, CAZyme gene clusters and substrates (tools: [run_dbcan](https://github.com/bcb-unl/run_dbcan))

As a general workflow, we recommend to first look at the summary reports ([ARGs](#hamronization), [AMPs](#ampcombi), [BGCs](#combgc)), to get a general overview of what hits have been found across all the tools of each functional group. After which, you can explore the specific output directories of each tool to get more detailed information about each result. The tool-specific output directories also includes the output from the functional annotation steps of either [prokka](https://github.com/tseemann/prokka), [pyrodigal](https://github.com/althonos/pyrodigal), [prodigal](https://github.com/hyattpd/Prodigal), or [Bakta](https://github.com/oschwengers/bakta) if the `--save_annotations` flag was set. Additionally, taxonomic classifications from [MMseqs2](https://github.com/soedinglab/MMseqs2) are saved if the `--taxa_classification_mmseqs_db_savetmp` and `--taxa_classification_mmseqs_taxonomy_savetmp` flags are set.

Similarly, all downloaded databases are saved (i.e. from [MMseqs2](https://github.com/soedinglab/MMseqs2), [antiSMASH](https://docs.antismash.secondarymetabolites.org), [AMRFinderPlus](https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder), [Bakta](https://github.com/oschwengers/bakta), [DeepARG](https://bitbucket.org/gusphdproj/deeparg-ss/src/master), [RGI](https://github.com/arpcard/rgi), and/or [AMPcombi](https://github.com/Darcy220606/AMPcombi)) into the output directory `<outdir>/databases/` if the `--save_db` flag was set.
Similarly, all downloaded databases are saved (i.e. from [MMseqs2](https://github.com/soedinglab/MMseqs2), [antiSMASH](https://docs.antismash.secondarymetabolites.org), [AMRFinderPlus](https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder), [Bakta](https://github.com/oschwengers/bakta), [DeepARG](https://bitbucket.org/gusphdproj/deeparg-ss/src/master), [RGI](https://github.com/arpcard/rgi), [AMPcombi](https://github.com/Darcy220606/AMPcombi), and/or [run_dbcan](https://github.com/bcb-unl/run_dbcan)) into the output directory `<outdir>/databases/` if the `--save_db` flag was set.

Furthermore, for reproducibility, versions of all software used in the run is presented in a [MultiQC](http://multiqc.info) report.

Expand Down Expand Up @@ -41,6 +42,8 @@ results/
| ├── deepbgc/
| ├── gecco/
| └── hmmsearch/
├── cazyme/
| └── dbcan/
├── databases/
├── multiqc/
├── pipeline_info/
Expand All @@ -63,11 +66,11 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes p

Input contig QC with:

- [SeqKit](https://bioinf.shenwei.me/seqkit/) (default) - for separating into long- and short- categories
- [SeqKit](https://bioinf.shenwei.me/seqkit/) (default) for separating into long- and short- categories

Taxonomy classification of nucleotide sequences with:

- [MMseqs2](https://github.com/soedinglab/MMseqs2) (default) - for contig taxonomic classification using 2bLCA.
- [MMseqs2](https://github.com/soedinglab/MMseqs2) (default) for contig taxonomic classification using 2bLCA.

ORF prediction and annotation with any of:

Expand Down Expand Up @@ -98,18 +101,22 @@ Antimicrobial Peptides (AMPs):
Biosynthetic Gene Clusters (BGCs):

- [antiSMASH](#antismash) – biosynthetic gene cluster detection.
- [deepBGC](#deepbgc) - biosynthetic gene cluster detection, using a deep learning model.
- [deepBGC](#deepbgc) biosynthetic gene cluster detection, using a deep learning model.
- [GECCO](#gecco) – biosynthetic gene cluster detection, using Conditional Random Fields (CRFs).
- [hmmsearch](#hmmsearch) – biosynthetic gene cluster detection, based on hidden Markov models.

Carbohydrate-active enzymes (CAZYMEs)

- [run_dbcan](https://github.com/bcb-unl/run_dbcan) – carbohydrate-active enzyme (CAZyme), CAZyme gene clusters and substrate detection.

Output Summaries:

- [AMPcombi](#ampcombi) – summary report of antimicrobial peptide gene output from various detection tools.
- [hAMRonization](#hamronization) – summary of antimicrobial resistance gene output from various detection tools.
- [argNorm](#argNorm) - Normalize ARG annotations from [ABRicate](#abricate), [AMRFinderPlus](#amrfinderplus), and [DeepARG](#deeparg) to the ARO
- [comBGC](#combgc) – summary of biosynthetic gene cluster output from various detection tools.
- [MultiQC](#multiqc) – report of all software and versions used in the pipeline.
- [Pipeline information](#pipeline-information) – report metrics generated during the workflow execution.
- [AMPcombi](#ampcombi) – summary report of antimicrobial peptide gene output from various detection tools
- [hAMRonization](#hamronization) – summary of antimicrobial resistance gene output from various detection tools
- [argNorm](#argNorm) Normalize ARG annotations from [ABRicate](#abricate), [AMRFinderPlus](#amrfinderplus), and [DeepARG](#deeparg) to the ARO
- [comBGC](#combgc) – summary of biosynthetic gene cluster output from various detection tools
- [MultiQC](#multiqc) – report of all software and versions used in the pipeline
- [Pipeline information](#pipeline-information) – report metrics generated during the workflow execution

## Tool details

Expand Down Expand Up @@ -466,6 +473,35 @@ Note that filtered FASTA is only used for BGC workflow for run-time optimisation

[GECCO](https://gecco.embl.de) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).

### CAZyme annotation tools

#### run_dbcan

<details markdown="1">
<summary>Output files</summary>

- `cazyme/`
- `dbcan/`
- `cazyme_annotation/`
- `<sample.id>_overview.tsv`: TSV file containing the results of dbCAN CAZyme annotation
- `<sample.id>_dbCAN_hmm_results.tsv`: TSV file containing the detailed dbCAN HMM results for CAZyme annotation
- `<sample.id>_dbCANsub_hmm_results.tsv`: TSV file containing the detailed dbCAN subfamily results for CAZyme annotation
- `<sample.id>_diamond.out`: TSV file containing the detailed dbCAN diamond results for CAZyme annotation
- `cgc/`
- `<sample.id>_cgc.gff`: GFF file containing the CAZyme gene clusters (CGC) identified by dbCAN. This file is generated from the dbCAN annotation and contains the locations of CAZyme gene clusters in the genome
- `<sample.id>_cgc_standard_out.tsv`: Standard output file from dbCAN for CAZyme gene clusters (CGC) in a tabular format. This file summarizes the CAZyme gene clusters identified in the genome
- `<sample.id>_diamond.out.tc`: TSV file containing the diamond output for transporter annotation
- `<sample.id>_TF_hmm_results.tsv`: TSV file containing the results of transcription factor screening
- `<sample.id>_STP_hmm_results.tsv`: TSV file containing the results of signaling transduction proteins (STP) annotation
- `substrate/`
- `<sample.id>_total_cgc_info.tsv`: TSV file summarizing the total additional genes in the genome
- `<sample.id>_substrate_prediction.tsv`: TSV file containing the substrate predictions based on the CGC annotations from dbCAN
- `<sample.id>_synteny_pdf/`: Directory containing one or more PDF files showing the syntenic regions of the CGCs in DNA sequence as identified by dbCAN

</details>

[run_dbcan](https://github.com/bcb-unl/run_dbcan) is an automated tool for carbohydrate-active enzyme (CAZyme), CAZyme gene cluster and substrate annotation.

### Summary tools

[AMPcombi](#ampcombi), [hAMRonization](#hamronization), [comBGC](#combgc), [MultiQC](#multiqc), [pipeline information](#pipeline-information), [argNorm](#argnorm).
Expand Down
Loading
Loading