MRGV provides 109,778 high-confidence viral genomes representing 28,824 species-level vOTUs, together with over 46% of 1.3 million non-redundant viral protein sequences annotated using structure-informed PHROG assignments.
You can access and browse all MRGV data and information in https://www.decodebiome.org/MRGV/
Kim, H.J. et al (2026). Incorporating viral genome binning in a mouse gut virome catalog enables accurate age prediction in preparation
| Data | Description | Link |
|---|---|---|
| MRGV_Representative_Metadata.tsv | Metadata for 28,824 representative vMAGs | Click to download (8.5MB) |
| MRGV_METADATA_ALL_GENOMES.csv | Metadata for 109,778 All vMAGs | Click to download (28.0MB) |
| Data | Description | Link |
|---|---|---|
| MRGV_Representative_Genomes.tar.gz | 28,824 Representative vMAGs | Click to download (475.8MB) |
| MRGV_All_Genomes.tar.gz | 109,778 vMAGs of MRGV All vMAGs | Click to download (1.3GB) |
| Data | Description | Link |
|---|---|---|
| MRGV_PC_ID100.tar.gz | A total of 1,376,499 CDS and metadata, clusterd with 100% AAI | Click to download (223.4MB) |
| MRGV_PC_ID90 DB.tar.gz | A total of 954,585 CDS and metadata, clusterd with 90% AAI | Click to download (147.5.4MB) |
| MRGV_PC_ID70 DB.tar.gz | A total of 746,733 CDS and metadata, clusterd with 70% AAI | Click to download (115.9.4MB) |
| MRGV_PC_ID50 DB.tar.gz | A total of 652,176 CDS and metadata, clusterd with 70% AAI | Click to download (102.0MB) |
| MRGV_PC_ID30 DB.tar.gz | A total of 625,774 CDS and metadata, clusterd with 70% AAI | Click to download (97.3MB) |
| Data | Description | Link |
|---|---|---|
| MRGV_Repr_Kraken2DB.tar.gz | Kraken2 DB for 28,824 representative vMAGs | Click to download (408.9MB) |
| MRGV_All_Variant_kraken2DB.tar.gz | Kraken2 DB for 109,778 All vMAGs | Click to download (426.8MB) |
- HumanDecontamination.py : Removal human reads using bowtie2
- Trimmomatic.py : Trimming adaptors and filter low qualited reads using Trimmomatic
- MEGAHIT.py : Running MEGAHIT for read assembly
- MetaSPAdes.py : Running MetaSPAdes fro read assembly
- DeepVirFinder.py : Running DeepVirFinder and filtering confident viral contigs
- Phigaro.py : Running Phigaro to predict Prophage from assemblies
- VIBRANT.py : Running VIBRANT to predict viral contigs and lifestyle
- Vclust.py : Running VClust for sample-wise deduplication of viral contigs from DeepVirFinder, Phigaro and VIBRANT, using UCLUST
- GeNomad.py : Running GeNomad on the deduplicated viral contigs for revalidation
- VirRep.py : Running VirRep on the deduplicated viral contigs for revalidation
- GetCoverage.py : Computing sample-wise read coverage profile using bowtie2
- GetMetabat2Depth.py : Generating Metabat2 Depth format tables
- GenerateCovTable.py : Generating vRhyme coverage table from Metabat2 Depth table
- MetaBat2.py : Running Metabat2 for viral binning on viral contigs
- Semibin2.py : Running Semibin2 for viral binning on viral contigs
- vRhyme.py : Running vRhyme for viral binning on viral contigs
- BinConsolidate.py : Sample-wise consolidation of bins from Metabat2, Semibin2 and vRhyme
- Pharokka.py : Running Pharokka to generate initial annotated GenBank table
- PholdPredict.py : Running Phold Predict to predict 3Di embeddings using FrostT5 model
- PholdCompare.py : Running Phold Compare to find the hits using foldseek
- LinClust.py : Running Linclust in MMSeq2 to generate protein clusters
- Minimap2.py : Running minimap to align short reads to viral genomes
- CoverM.py : Running CoverM to calculate alignment coverage
- UPGMA.rs : Conduct UPGMA clustering of genomes based on taxonomic rank delineation criteria
- KendallTau.py : Compute Kendall Tau and pvalue, and generating Kendall distance matrix
- Uniqueness.py : Calculate Uniqueness based on distance matrix, with/without cage mates
- Maaslin2.R : Running Masslin2 to extract significantly differential viral taxa
- XGBoostRegressor.py : Running XGBoostRegressor to predict mice ages using viral genus abundance table