-
Notifications
You must be signed in to change notification settings - Fork 100
Querying Variant Data
There are two main ways of querying loaded data from OpenCGA Storage using Command Line Interface (CLI), these CLIs are:
- opencga.sh: a top-level CLI for querying data using OpenCGA Catalog.
- opencga-storage.sh: a low-level CLI for querying data using variant attributes such as region, gene, annotation or genotypes.
Both CLIs accept similar functionality and parameters for querying by variation attributes such as region, annotation or stats. The main difference between them is that top-level CLI can make use of OpenCGA Catalog and therefore use that information for making more complex queries such as querying by family or sample annotations.
They can be found in $OPENCGA_HOME/bin folder.
In version v0.6.0 this is the most complete way of querying data. This allows to query by:
- genomic regions, gene and SNP IDs
- query by variant annotation such as consequence types, conservations scores, polyphen, sift or population frequencies
- sample genotypes
- variant stats in the study
- some basic aggregations such as ranks, group-by or counts
All this filters can be combined. There are some query modifiers implemented:
- skip and limit
- count
From the $OPENCGA_HOME folder you can execute to see all the parameters:
./bin/opencga-storage.sh fetch-variants -h
NOTE: for security reasons you need to login in OpenCGA if you want to use this CLI with a standard OpenCGA installation, this will guarantee you only access to the data you have permission, to login you only need to execute:
./bin/opencga.sh users login -u USER -p PASSWORD
A session token will be stored in your home directory and used internally.
-
To filter by region and SIFT ./bin/opencga-storage.sh fetch-variants --database opencga_gel_100K_GENOMES_PROJECT --return-study 2 --region 22:1500000-20000000 --protein-substitution "sift<0.2"
-
To count the number of variants in a specific region: ./bin/opencga-storage.sh fetch-variants --database DATABASE_NAME --return-study 2 --region 22:1500000-2000000 --protein-substitution "sift<0.2" --count
OpenCGA is an open source project and it is freely available.
General
- Home
- Architecture
- Data Models
- RESTful Web Services
- Configuration
- Download and Installation
- Tutorials
OpenCGA Catalog
OpenCGA Storage
About