PGAP2

Pan-Genome Analysis Pipeline 2

Quick start

Basic usage

The input directory contains all the genome and annotation files.

PGAP2 supports multiple input formats: GFF files in the same format as those output by Prokka, GFF files with their corresponding genome FASTA files in separate files, GenBank flat files (GBFF), or just genome FASTA files (with --reannot required).

Different formats of input files can be mixed in one input directory. PGAP2 will recognize and process them based on their prefixes and suffixes.

pgap2 main -i inputdir/ -o outputdir/

Preprocessing

Quality checks and visualization are conducted by PGAP2 during the preprocessing step. PGAP2 generates an interactive HTML file and corresponding vector figures to help users understand their input data. The input data and pre-alignment results are stored as a pickle file for quick restarting of the same calculation step.

pgap2 prep -i inputdir/ -o outputdir/

Postprocessing

The postprocessing pipeline is performed by PGAP2. There are various submodules integrated into the postprocessing module, such as statistical analysis, single-copy tree building, population clustering, and Tajima's D test. Regardless of which submodule you want to use, you can always run it as follows:

pgap2 post [submodule] [options] -i inputdir/ -o outputdir/

The inputdir is the outputdir of main module.

PGAP2 also support statistical analysis using a PAV file indepandently:

pgap2 post profile --pav your_pav_file -o outputdir/

Installation

The best way to install full version of PGAP2 package is using conda:

conda create -n pgap2 -c conda-forge -c bioconda -c defaults pgap2

alternatively it is often faster to use the mamba solver

conda create -n pgap2 -c conda-forge mamba
conda activate pgap2 
mamba install -c conda-forge -c bioconda -c defaults pgap2

Or sometimes you only want to carry out a specific function, such as partioning and don't want install too many extra softwares for fully version of PGAP2, then you can just install PGAP2:

pip install pgap2

Or via source file:

git clone https://github.com/bucongfan/PGAP2

And then install extra software that only necessary for a specific function by yourself.

Dependencies of PGAP2 are list below, and PGAP2 will check them whether in environment path or in pgap2/dependencies folder.

Preprocessing

One of clustering software
- cd-hit
- MMseqs2
One of alignment software
- diamond
- blast+

Main

One of clustering software
- cd-hit
- MMseqs2
mcl
One of alignment software
- diamond
- blast+
Using --retrieve to retrieve missing gene loci
- miniprot
- seqtk
Using --reannot to re-annotate your genome
- prodigal

Postprocessing

One of MSA software
- muscle
- mafft
- tcoffee
ClipKIT
One of phylogenetic tree construction software
ClonalFrameML
maskrc-svg
fastbaps

Visulization in Preprocessing and Postprocessing modules

PGAP2 will call Rscript in your environment virable. The library should have:

ggpubr
ggrepel
dplyr
tidyr
patchwork
optparse

Detailed documentation

Please refer documentation from wiki.

Name	Name	Last commit message	Last commit date
Latest commit bucongfan Update __init__.py Jan 17, 2025 cf67b00 · Jan 17, 2025 History 20 Commits
.github/workflows	.github/workflows	Update python-publish.yml	Jan 16, 2025
pgap2	pgap2	Update __init__.py	Jan 17, 2025
LICENSE	LICENSE	Initial commit	Jan 10, 2025
MANIFEST.in	MANIFEST.in	Add files via upload	Jan 17, 2025
README.md	README.md	Update README.md	Jan 16, 2025
pyproject.toml	pyproject.toml	Add files via upload	Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PGAP2

Quick start

Basic usage

Preprocessing

Postprocessing

Installation

Preprocessing

Main

Postprocessing

Visulization in Preprocessing and Postprocessing modules

Detailed documentation

About

Releases 2

Packages

Languages

License

bucongfan/PGAP2

Folders and files

Latest commit

History

Repository files navigation

PGAP2

Quick start

Basic usage

Preprocessing

Postprocessing

Installation

Preprocessing

Main

Postprocessing

Visulization in Preprocessing and Postprocessing modules

Detailed documentation

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages