PerfringeTyper is a bioinformatics tool designed to classify Clostridium perfringens toxinotypes and perform clade assignment based on genomic data. It uses reference toxin gene sequences to identify the presence of toxin genes in input genomic assemblies (contigs) and assigns a toxinotype based on predefined rules (Rood et al. 2018). Clade assignment is based on the methodology implemented by AuriClass, which assigns the input to one of five phylogenetic clades (Clades I to V) that have been observed previously within Clostridium perfringens (Geier et al. 2021 & Gulliver et al. 2023).
This tool is particularly useful for researchers studying Clostridium perfringens and its associated toxins, enabling rapid and automated classification of toxinotypes.
- Input: Accepts a tab of contig file paths as input (Isolate_ID, contigs_path).
- Output:
- A detailed toxin profile for each sample, listing detected toxin genes (typing and non-typing toxins).
- A simplified toxinotype classification for each sample (Type A, Type B etc).
- A simplified clade assignment for each sample (Clade I, Clade II etc).
- Customizable Thresholds:
- Identity and coverage thresholds for toxin gene alignments can be adjusted.
- Kmer and sketch thresholds for mash based clade assignments can be adjusted.
- Multithreading: Supports parallel processing for faster analysis.
- Default Reference Data:
- A default set of toxin gene sequences and toxinotype classification rules.
- A default mash database for clade assignment and config file.
- Input Parsing:
- The tool reads the sample table and processes each contig file listed.
- Alignment:
- Uses
minimap2to align contigs against the reference toxin genes.
- Uses
- Gene Detection:
- Filters alignments based on identity and coverage thresholds.
- Toxinotype Classification:
- Matches detected genes to predefined toxinotype rules in the configuration file.
- Clade Assignment:
- Use
mash distto find distance between sample and database of clade representatives and outgroups.
- Use
- Output Generation:
- Produces detailed and simplified output tables.
- Python 3.8 or higher
- Required libraries and packages:
Biopythonpandasminimap2mash
PerfringeTyper can be installed from the environment.yaml file using conda/mamba.
mamba env create -f environment.yml
The tool includes default reference data located in the data/ directory:
- Toxin Genes:
data/perf_toxins_nuc.fasta - Toxinotype Rules:
data/toxintypes.yml - Clade Mash Database:
data/perfringens-clade-database.msh - Clade Config File:
data/clade_config.csv
You can replace these files with custom data if needed.
Usage: PerfringeTyper.py [-h] -i INPUT_TABLE -o OUTDIR [-t THREADS] [-n TOXIN_NUCL] [-r TOXIN_RULES] [-d IDENTITY] [-v COVERAGE] [-m MASH_DATABASE] [-c CLADE_CONFIG] [-k KMER_THRESHOLD] [-s SKETCH_THRESHOLD] [-x TOXIN_TYPING] [-a CLADE_ASSIGNMENT] [-f KEEP_ALL_FILES]
Description:
Toxin typing and clade assignment tool for Clostridium perfringens
Required Options:
-i, --input_table Path to the input table containing isolate IDs and path to contigs (FASTA)
-o, --outdir Output directory
Parallelism:
-t, --threads Number of threads used (default: 1)
Toxin Typing Options:
-n, --toxin_nucl Path to the reference genes FASTA file (default: PerfringeTyper/data/perf_toxins_nuc.fasta)
-r, --toxin_rules Path to the toxinotype rules YAML file (default: PerfringeTyper/data/toxintypes.yml)
-d, --identity Identity threshold (%) for detecting toxin genes (default: 90)
-v, --coverage Coverage threshold (%) for detecting toxin genes (default: 90)
Clade Assignment Options:
-m, --mash_database Path to mash database containing clade representatives (default: PerfringeTyper/data/perfringens-clade-database.msh)
-c, --clade_config Path to clade config file for clade representatives (default: PerfringeTyper/data/clade_config.csv)
-k, --kmer_threshold K-mer threshold for mash sketch (default: 29)
-s, --sketch_threshold Sketch size for mash sketch (default: 10000)
Other:
-x, --toxin_typing Only perform toxin typing
-a, --clade_assignment Only perform clade assignment
-f, --keep_all_files Keep minimap2 (.paf) and mash (.tsv) files in output directory for troubleshooting
-h, --help Show this help message and exit
Input:
Suppose you have a sample table called contigs.tab with the following content:
Isolate_ID contigs
Sample1 /path/to/sample1_contigs.fasta
Sample2 /path/to/sample2_contigs.fasta
Run the tool as follows: python PerfringeTyper.py -i contigs.tab -o results/
Output: There are three output files that are produced.
toxin_profiles.csv: A detailed table listing the presence/absence of each toxin gene for each sample.
| Sample | Toxinotype | Genes_Found | cpe | colA | cpa |
|---|---|---|---|---|---|
| Sample1 | F | cpe,cpa | 1 | 0 | 1 |
| Sample2 | A | colA,cpa | 0 | 1 | 1 |
toxinotypes.csv: A simplified table with the toxinotype classification for each sample.
| Sample | Toxinotype |
|---|---|
| Sample1 | F |
| Sample2 | A |
clade_assignments.csv: A simplified table with the clade assignment for each sample.
| Sample | Clade_Assignment |
|---|---|
| Sample1 | Clade II |
| Sample2 | Clade V |
-
There are different sequence identity thresholds set for some genes compared to others. This is because the chromosomal genes have a much larger diversity compared to plasmid genes (lowering netB and netF hits to 80% results in cross hits).
Toxin Gene Sequence Identity Default 90% cpa 70% colA 80% pfoA 80% tpeL 80% -
If no toxin genes are detected for a sample, it will be classified as Unknown (
Type Unk). -
For clade assignment, if the closest match is to one of the outgroup isolates then it will be classified as
Outgroup. -
For clade assignment, if the closest match is to one of the clade representatives but the mash distance is >0.055 then it will be classified as
No Clade Assignment. This may occur because the isolate is closely related to Clostridium perfringens but a representative outgroup is not present within the mash database.
Geier, R. R., Rehberger, T. G., & Smith, A. H. (2021). Comparative genomics of Clostridium perfringens reveals patterns of host-associated phylogenetic clades and virulence factors. Frontiers in Microbiology, 12, 649953.
Gulliver, E. L., Adams, V., Marcelino, V. R., Gould, J., Rutten, E. L., Powell, D. R., Young, R. B., D’Adamo, G. L., Hemphill, J., Solari, S. M., Revitt-Mills, S. A., Munn, S., Jirapanjawat, T., Greening, C., Boer, J. C., Flanagan, K. L., Kaldhusdal, M., Plebanski, M., Gibney, K. B., … Forster, S. C. (2023). Extensive genome analysis identifies novel plasmid families in Clostridium perfringens. Microbial Genomics, 9(4).
Rood, J. I., Adams, V., Lacey, J., Lyras, D., McClane, B. A., Melville, S. B., Moore, R. J., Popoff, M. R., Sarker, M. R., Songer, J. G., Uzal, F. A., & Van Immerseel, F. (2018). Expansion of the Clostridium perfringens toxin-based typing scheme. Anaerobe, 53, 5–10.
This tool is open-source and distributed under the GNU GENERAL PUBLIC LICENSE.