-
Notifications
You must be signed in to change notification settings - Fork 15
Home
Welcome to the PANOPLY wiki!
PANOPLY is a cloud-based platform for applying statistical and machine learning algorithms to transform multi-omic data from cancer samples into biologically meaningful and interpretable results. PANOPLY leverages Terra, and is designed to be flexible, automated, reproducible, scalable, and secure. A wide array of algorithms for proteogenomic data analysis have been implemented in PANOPLY.
Documentation and Tutorial can be found in the Sidebar.
- To run PANOPLY, you need a Terra account. To create an account follow the steps here. A Google account is required to create a Terra account.
- Once you have an account, you need to create a cloud billing account and project. Here, you will find step-by-step instructions to set up billing, including a free credits program for new users to try out Terra with $300 in Google Cloud credits.
Input data consists of a ZIP compressed file containing the following data tables:
- Datasets for preconfigured piplelines:
- At least one proteomics dataset (global proteome, phosphoproteome, acetylome, ubiquitylome) --- after database searching and quantification;
- Genomics data --- CNA and RNA data (both are required), normalized and filtered as needed;
- Dataset(s) for running individual analysis modules:
- At least one dataset: either proteomics (global proteome, phosphoproteome, acetylome, ubiquitylome--after database searching and quantification) or genomics (RNA, CNA--normalized and filtered as needed) data;
-
Annotation
table with at leastSample.ID
andType
columns, in addition to an arbitrary number of annotations for the samples.
- A
groups
file that lists a subset of annotations (one per line) to be used for association and enrichment analysis; - A default
parameter
file is used, unless one is specified in the input (inyaml
format); - Pathway databases for PTM-SEA (PTM-SigDB v1.9.0) and GSEA (v6.2 hallmark pathways) are automatically included, and can be over-ridden by providing appropriate
gmt
input files.
File | Format | Specification |
---|---|---|
annotation |
csv |
The annotation table must include Sample.ID and Type columns. Sample.ID s must be unique and cannot have duplicates. If replicate samples are present, they must have unique Sample.ID s, but can be identified as replicates by using an additional Participant annotation column with identical id's for replicate samples. The annotation table input to the panoply_parse_sm_table module must include Experiment (TMT or iTRAQ plex number) and Channel (TMT or iTRAQ channel id). |
groups |
csv |
Lists annotation columns (one per line) present in the annotation table for use in association and enrichment analysis in various PANOPLY modules. If a groups file is not specified in the input, one will be interactively created by the PANOPLY-startup-notebook using the annotations present in the annotation table. |
proteomics data | gct v1.3 |
Can be proteome, phosphoproteome, acetylome, ubiquitylome. Data matrix (resulting from MS database search) must have protein/PTM-site as rows and samples as columns, with entries containing abundance values (usually log2 ratio to a common reference). At least one proteomics data type must be specified. PANOPLY provides an option to normalize proteomics data, if needed. |
genomics data | gct v1.3 |
Both CNA (normalized log-ratio, derived from WXS, WGS or combination) and RNA expression (log-transformed and normalized, derived from RNAseq) data are required. These data must be normalized prior to input in PANOPLY. To include mutation data, samples with mutations in relevant genes must be identified and an annotation column added for each gene in the annotation table---maf or vcf cannot be directly input. |
parameters | yaml |
The master-parameters.yaml file (also available in the PANOPLY Production Workspaces on Terra) contains default parameter settings for all PANOPLY modules, and is used as a default, if no input parameters file is specified. |
pathway databases | gmt |
If no pathway databases are specified in the input, PTM-SigDB v1.9.0 is used for PTM-SEA, and the Cancer Hallmark Pathways v6.2 is used for GSEA and ssGSEA. |
- All proteomics and genomics data tables must have sample ids (column names) conforming to the
Sample.ID
s used in theannotation
table. Ideally, the sample annotations included in theannotation
table would be present in the GCT v1.3 data files as column annotations. - All genomics data tables must be appropriately normalized/filtered prior to use in PANOPLY. Proteomics data can be optionally normalized/filtered in PANOPLY.
PANOPLY modules are grouped into Data Preparation
, Data Analysis
, Report
and Support
categories. There is also documentation on navigating results. In addition, a step-by-step tutorial is included. The documentation main pages are listed below, and can also be accessed using the sidebar for this Wiki.
- Home
- PANOPLY Tutorial
- Data Preparation Modules
- Data Analysis Modules
-
Report Modules
- panoply_association_report
- panoply_blacksheep_report
- panoply_cna_correlation_report
- panoply_cons_clust_report
- panoply_immune_analysis_report
- panoply_mimp_report
- panoply_nmf_report
- panoply_normalize_ms_data_report
- panoply_rna_protein_correlation_report
- panoply_sampleqc_report
- panoply_sankey_report
- panoply_ssgsea_report
- Support Modules
- Navigating Results
- PANOPLY without Terra
- Customizing PANOPLY
- Pipelines