Skip to content

bucongfan/PGAP2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

cf67b00 · Jan 17, 2025

History

20 Commits
Jan 16, 2025
Jan 17, 2025
Jan 10, 2025
Jan 17, 2025
Jan 16, 2025
Jan 16, 2025

Repository files navigation

PGAP2

Pan-Genome Analysis Pipeline 2

Quick start

Basic usage

The input directory contains all the genome and annotation files.

PGAP2 supports multiple input formats: GFF files in the same format as those output by Prokka, GFF files with their corresponding genome FASTA files in separate files, GenBank flat files (GBFF), or just genome FASTA files (with --reannot required).

Different formats of input files can be mixed in one input directory. PGAP2 will recognize and process them based on their prefixes and suffixes.

pgap2 main -i inputdir/ -o outputdir/

Preprocessing

Quality checks and visualization are conducted by PGAP2 during the preprocessing step. PGAP2 generates an interactive HTML file and corresponding vector figures to help users understand their input data. The input data and pre-alignment results are stored as a pickle file for quick restarting of the same calculation step.

pgap2 prep -i inputdir/ -o outputdir/

Postprocessing

The postprocessing pipeline is performed by PGAP2. There are various submodules integrated into the postprocessing module, such as statistical analysis, single-copy tree building, population clustering, and Tajima's D test. Regardless of which submodule you want to use, you can always run it as follows:

pgap2 post [submodule] [options] -i inputdir/ -o outputdir/

The inputdir is the outputdir of main module.

PGAP2 also support statistical analysis using a PAV file indepandently:

pgap2 post profile --pav your_pav_file -o outputdir/

Installation

The best way to install full version of PGAP2 package is using conda:

conda create -n pgap2 -c conda-forge -c bioconda -c defaults pgap2

alternatively it is often faster to use the mamba solver

conda create -n pgap2 -c conda-forge mamba
conda activate pgap2 
mamba install -c conda-forge -c bioconda -c defaults pgap2

Or sometimes you only want to carry out a specific function, such as partioning and don't want install too many extra softwares for fully version of PGAP2, then you can just install PGAP2:

pip install pgap2

Or via source file:

git clone https://github.com/bucongfan/PGAP2

And then install extra software that only necessary for a specific function by yourself.

Dependencies of PGAP2 are list below, and PGAP2 will check them whether in environment path or in pgap2/dependencies folder.

Preprocessing

Main

Postprocessing

Visulization in Preprocessing and Postprocessing modules

PGAP2 will call Rscript in your environment virable. The library should have:

  • ggpubr
  • ggrepel
  • dplyr
  • tidyr
  • patchwork
  • optparse

Detailed documentation

Please refer documentation from wiki.