-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Filtering faulty circularized genome using conspecific MAGs.
Reconstruction of the circularized genomes is the ultimate goal of prokaryotic genome assembly. The development of accurate long-read sequencing technology enables the assembly of circularized genomes from highly complex metagenomic samples. However, prokaryotic genomes tend to have many repetitive sequences, and those often result in faulty assembly closing, thereby generating circularized genomes with significant gaps. cMAGfilter filters out the circularized metagenome-assembled genomes (cMAGs) with such gaps using their conspecific MAGs. For a given cMAG and its conspecific MAGs, it first searches core contigs, the contigs shared by most of the conspecific MAGs, from conspecific MAGs. Next, it calculates the core contig retrieval rate from the cMAG and filters out the cMAG using the information.

cMAGfilter requires Python>=3.6 and mummer4 package. You can install mummer from its tarball or from bioconda. Please locate the mummer4 package softwares in PATH or specify the location with -nuc parameter.
git clone https://github.com/netbiolab/cMAGfilter.git
cd cMAGfilter
python3 setup.py install --user
python3 cMAGfilter.py examples/input/circular_contigs/Akkermansia_muciniphila.fna examples/input/conspecific_MAGs/Akkermansia_muciniphila examples/output/Akkermansia_muciniphila
You can find the example output files from 'examples/output/Mesosutterella_multiformis'.
-
all_by_all_alignment_results/This directory contains all-by-all nucmer alignment results between conspecific MAGs. -
[circular-contig]_align_back_results/This directory contains core contigs to circular contig nucmer alignment results. -
conspecific_genomes.contig_report.tsvcontains the information on whether the contigs of a conspecific MAG are founded from the other conspecific MAGs. -
[circular-contig]_core_contigs_alignment.core_contig_stat.tsvcontains the list of core contig and their alignment result against circular contig. -
core_contigs.fnais FASTA sequence file of core contigs. -
[circular-contig]_core_contigs_alignment.summary.tsvThis is the final result file.
- circular contig name
- circular contig alignment length
- circular contig length
- circular contig alignment coverage (2. / 3.)
- core contig alignment length
- core contig length
- core contig alignment coverage (5. / 6.)
- aligned core contig count
- core contig count
- core contig retrieval rate (8. / 9.): In the paper, we considered the core contig is genuine if the core contig retrieval rate is higher than 0.95. Therefore, in this example case, we filter out the Mesosutterella_multiformis's core contig as its core contig retrieval rate is 0.867.
The entire 110 HiFi circular contigs and thir conspecific MAGs used in the paper are available from the link (6.6GB).
CY Kim, J Ma, I Lee, HiFi Metagenomic Sequencing Enables Assembly of Accurate and Complete Genomes from Human Gut Microbiota, bioRxiv preprint, Feb. 2022