Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions modules/nf-core/genomeuploader/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
channels:
- conda-forge
- bioconda
dependencies:
- "bioconda::genome-uploader=2.5.0"
62 changes: 62 additions & 0 deletions modules/nf-core/genomeuploader/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
process GENOMEUPLOADER {
tag "$meta.id"
label 'process_single'

secret secrets.ENA_WEBIN ? "ENA_WEBIN" : ""
secret secrets.ENA_WEBIN_PASSWORD ? "ENA_WEBIN_PASSWORD" : ""

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/genome-uploader:2.5.0--pyhdfd78af_0':
'biocontainers/genome-uploader:2.5.0--pyhdfd78af_0' }"

input:
tuple val(meta), path(metadata_tsv)
path(fastas, stageAs: "genomes/*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rename to fasta


output:
tuple val(meta), path("upload_output/*") , emit: upload_output_dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to emit: results
This is the pattern I mostly see when outputting whole folders

tuple val(meta), path("upload_output/MAG_upload/submission.xml") , emit: submission
tuple val(meta), path("upload_output/MAG_upload/registered_MAGs_${prefix}.tsv") , emit: registered_mags
tuple val(meta), path("upload_output/MAG_upload/genome_samples.xml") , emit: genome_samples
tuple val(meta), path("upload_output/MAG_upload/manifests_${prefix}/*.manifest"), emit: manifests
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
"""
genome_upload \\
$args \\
--upload_study "${meta.study_accession}" \\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of meta here, but we will need a second opinion if that's preferred over additional input params

--centre_name "${meta.center_name}" \\
--genome_info ${metadata_tsv} \\
--out upload_output

cat <<-END_VERSIONS > versions.yml
"${task.process}":
genome_upload: \$( genome_upload --version | sed 's/genome_uploader //' )
ena-webin-cli: \$( ena-webin-cli -version )
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
prefix = task.ext.prefix ?: "${meta.id}"
"""
mkdir -p upload_output/MAG_upload/manifests_${prefix}
touch upload_output/MAG_upload/submission.xml
touch upload_output/MAG_upload/registered_MAGs_${prefix}.tsv
touch upload_output/MAG_upload/genome_samples.xml
touch upload_output/MAG_upload/manifests_${prefix}/test_mag.manifest

cat <<-END_VERSIONS > versions.yml
"${task.process}":
genome_upload: \$( genome_upload --version | sed 's/genome_uploader //' )
ena-webin-cli: \$( ena-webin-cli -version )
END_VERSIONS
"""
}
109 changes: 109 additions & 0 deletions modules/nf-core/genomeuploader/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json
name: genomeuploader
description: Upload genome bins and MAGs in FASTA format to ENA (European Nucleotide Archive)
keywords:
- archiving
- ena
- mags
- bins
- upload
tools:
- genomeuploader:
description: Python script to upload bins and MAGs in fasta format to ENA (European Nucleotide Archive).
homepage: https://github.com/EBI-Metagenomics/genome_uploader
documentation: https://github.com/EBI-Metagenomics/genome_uploader
tool_dev_url: https://github.com/EBI-Metagenomics/genome_uploader
licence: ["Apache-2.0"]
identifier: ""

input:
- - meta:
type: map
description: |
Groovy Map containing sample information, the study_accession to upload the data, and the center name
e.g. `[ id:'test', study_accession:'ERP159782', center_name:'nf-core' ]`
- metadata_tsv:
type: file
description: TSV file containing metadata for genomes/bins to upload
pattern: "*.tsv"
ontologies:
- edam: http://edamontology.org/format_3475 # TSV
- fastas:
type: file
description: FASTA files containing genome/bin sequences to upload
pattern: "*.{fasta,fa,fna,fasta.gz,fa.gz,fna.gz}"
ontologies:
- edam: http://edamontology.org/format_1929 # FASTA
- edam: http://edamontology.org/format_3989 # GZIP format

output:
upload_output_dir:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', study_accession:'ERP159782', center_name:'nf-core' ]`
- upload_output/*:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- upload_output/*:
- upload_output:

type: directory
description: Directory containing all upload outputs
pattern: "upload_output/*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: "upload_output/*"
pattern: "*"

ontologies: []
submission:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', study_accession:'ERP159782', center_name:'nf-core' ]`
- upload_output/MAG_upload/submission.xml:
type: file
description: ENA submission XML file
pattern: "upload_output/MAG_upload/submission.xml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: "upload_output/MAG_upload/submission.xml"
pattern: "submission.xml"

ontologies:
- edam: http://edamontology.org/format_2332 # XML
registered_mags:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', study_accession:'ERP159782', center_name:'nf-core' ]`
- upload_output/MAG_upload/registered_MAGs_${prefix}.tsv:
type: file
description: TSV file mapping genome names to ENA accession numbers
pattern: "upload_output/MAG_upload/registered_MAGs_*.tsv"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: "upload_output/MAG_upload/registered_MAGs_*.tsv"
pattern: "registered_MAGs_*.tsv"

ontologies:
- edam: http://edamontology.org/format_3475 # TSV
genome_samples:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', study_accession:'ERP159782', center_name:'nf-core' ]`
- upload_output/MAG_upload/genome_samples.xml:
type: file
description: ENA genome samples XML file
pattern: "upload_output/MAG_upload/genome_samples.xml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: "upload_output/MAG_upload/genome_samples.xml"
pattern: "genome_samples.xml"

ontologies:
- edam: http://edamontology.org/format_2332 # XML
manifests:
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', study_accession:'ERP159782', center_name:'nf-core' ]`
- upload_output/MAG_upload/manifests_${prefix}/*.manifest:
type: file
description: ENA manifest files for genome upload
pattern: "upload_output/MAG_upload/manifests_*/*.manifest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern: "upload_output/MAG_upload/manifests_*/*.manifest"
pattern: "*.manifest"

ontologies: []
versions:
- versions.yml:
type: file
description: File containing software versions
pattern: "versions.yml"
ontologies:
- edam: http://edamontology.org/format_3750 # YAML

authors:
- "@mberacochea"
maintainers:
- "@mberacochea"
100 changes: 100 additions & 0 deletions modules/nf-core/genomeuploader/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
nextflow_process {

name "Test Process GENOMEUPLOADER"
script "../main.nf"
config "./nextflow.config"
process "GENOMEUPLOADER"

tag "modules"
tag "modules_nfcore"
tag "genomeuploader"

test("genome - fasta - gz") {

when {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
when {
when {
params {
module_args = '--mags'
}

process {
"""
// This module uses a csv as input, which contains the paths to the genomes/bins to upload
// That is why it contains a second parameter that accepts a Path with all the fasta files (mags and bins) to upload
// and that is why the path is genomes/<name> in the manifest
def metadata_content = [
["genome_name", "genome_path", "accessions", "assembly_software", "binning_software", "binning_parameters", "stats_generation_software", "completeness", "contamination", "genome_coverage", "metagenome", "co-assembly", "broad_environment", "local_environment", "environmental_medium", "rRNA_presence", "NCBI_lineage"].join("\t"),
["test_mag", "genomes/GCA_002688505.1_ASM268850v1_genomic.fna.gz", "ERR4647712", "SPAdes_4.1.0", "nf-core/mag", "default", "CheckM2_1.1.0", "90.0", "1.0", "10.0", "chicken gut metagenome", "False", "chicken gut", "chicken gut mucosa", "chicken gut mucosa", "True", "d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus_crispatus"].join("\t")
].join("\\n")

def metadata_file = file('genomes_metadata.tsv')
metadata_file.text = metadata_content

input[0] = [
[
id: 'test',
study_accession: 'ERP159782',
center_name: 'nf-core'
],
metadata_file
]
input[1] = file('https://github.com/nf-core/test-datasets/raw/refs/heads/magmap/testdata/GCA_002688505.1_ASM268850v1_genomic.fna.gz', checkIfExists: true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will probably need to be deposited in nf-core test-datasets as well

"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(
process.out.submission,
process.out.versions,
// Check registered_MAGs contains expected genome name (starts with test_mag)
file(process.out.registered_mags.get(0).get(1)).readLines()[0].split('\t')[0].startsWith('test_mag'),
// Check genome_samples.xml contains expected elements
file(process.out.genome_samples.get(0).get(1)).readLines().any { it.contains('<SAMPLE') },
file(process.out.genome_samples.get(0).get(1)).readLines().any { it.contains('alias="test_mag"') },
// Check manifest file exists and has content
file(process.out.manifests.get(0).get(1)).readLines().size() > 0,
file(process.out.manifests.get(0).get(1)).readLines().any { it.contains('STUDY') }
).match() }
)
}

}

test("genome - fasta - gz -stub") {

options "-stub"

when {
process {
"""
// This module uses a csv as input, which contains the paths to the genomes/bins to upload
// That is why it contains a second parameter that accepts a Path with all the fasta files (mags and bins) to upload
// and that is why the path is genomes/<name> in the manifest
def metadata_content = [
["genome_name", "genome_path", "accessions", "assembly_software", "binning_software", "binning_parameters", "stats_generation_software", "completeness", "contamination", "genome_coverage", "metagenome", "co-assembly", "broad_environment", "local_environment", "environmental_medium", "rRNA_presence", "NCBI_lineage"].join("\t"),
["test_mag", "genomes/GCA_002688505.1_ASM268850v1_genomic.fna.gz", "ERR4647712", "SPAdes_4.1.0", "nf-core/mag", "default", "CheckM2_1.1.0", "90.0", "1.0", "10.0", "chicken gut metagenome", "False", "chicken gut", "chicken gut mucosa", "chicken gut mucosa", "True", "d__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lactobacillus;s__Lactobacillus_crispatus"].join("\t")
].join("\\n")

def metadata_file = file('genomes_metadata.tsv')
metadata_file.text = metadata_content

input[0] = [
[
id: 'test',
study_accession: 'ERP159782',
center_name: 'nf-core'
],
metadata_file
]
input[1] = file('https://github.com/nf-core/test-datasets/raw/refs/heads/magmap/testdata/GCA_002688505.1_ASM268850v1_genomic.fna.gz', checkIfExists: true)
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert snapshot(process.out).match() }
Comment on lines +94 to +95
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{ assert process.success },
{ assert snapshot(process.out).match() }
{ assert process.success },
{ assert snapshot(
process.out.upload_output_dir,
process.out.registered_mags,
process.out.genome_samples,
process.out.manifests,
process.out.versions.collect{ path(it).yaml }
).match() }

)
}
}

}
29 changes: 29 additions & 0 deletions modules/nf-core/genomeuploader/tests/main.nf.test.snap
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see the stub test in the snapshot

"GCA_002688505.1_ASM268850v1 - fasta - gz": {
"content": [
[
[
{
"id": "test",
"study_accession": "ERP159782",
"center_name": "nf-core"
},
"submission.xml:md5,705c441ca687726152f8540dab4bb322"
]
],
[
"versions.yml:md5,36d7baf63c1face41d2fc0edd6263944"
],
true,
true,
false,
true,
true
],
"meta": {
"nf-test": "0.9.2",
"nextflow": "25.04.7"
},
"timestamp": "2025-10-08T18:33:31.006282"
}
}
5 changes: 5 additions & 0 deletions modules/nf-core/genomeuploader/tests/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
process {
withName: 'GENOMEUPLOADER' {
ext.args = '--mags'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ext.args = '--mags'
ext.args = params.module_args

}
}
Loading