Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver Analysis #13

Open
3 tasks
nicola-calonaci opened this issue Jul 10, 2024 · 5 comments
Open
3 tasks

Driver Analysis #13

nicola-calonaci opened this issue Jul 10, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@nicola-calonaci
Copy link
Collaborator

Driver Analysis is a subworkflow that does:

  • Global selection analysis (dndsCV)
  • Identification of genes driving positive/negative selection (dndsCV)
  • Identification of mutations with high immunogenicity (SOPRANO)

save the output of the tools to files and flags mutations in the mutation table accordingly.
Possible flags:

  • known driver (from a user defined list of drivers e.g. IntoGen) (boolean)
  • under_positive_selection (from dndsCV with user-defined significancy threshold) (boolean)
  • under_negative_selection (from dndsCV with user-defined significancy threshold) (boolean)
  • immunogenic (boolean)
  • quantities (from the various table) (integer/float/string)
@tucano
Copy link
Member

tucano commented Aug 5, 2024

STATUS 5 Ago 2024

Adding some info on the current status:

workflow DRIVER_ANNOTATION contains 2 process:

  1. BUILD_REFERENCE
    Input: tuple val(meta), path(cds), path(genome)
    Output: tuple val(meta), path("reference.rda"), emit: dnds_reference

  2. DNDSCV
    Input: tuple val(meta), path(snv_rds), path(driver_list), path(reference)
    Output: tuple val(meta), path("*_dnds.rds"), emit: dnds_rds

Both process are built with inline Rscript and tested for success and matching snapshot.

TESTING

nf-test test tests/modules/local/build_reference/main.nf.test
nf-test test tests/modules/local/dndscv/main.nf.test

TODO

  1. DRIVER_ANNOTATION workflow and test
  2. Params and interface for driver_list, cds and genome. Suggestion needed
  3. Container, actually i am testing locally with --profile test should work on your HPC I am using the cdslab.sif

Reference commit: b3319e3

@tucano
Copy link
Member

tucano commented Aug 6, 2024

STATUS 6 Ago 2024

  1. DRIVER_ANNOTATION subworkflow with basic testing (using prebuilt RefCDS)
  2. Container for dndscv: https://github.com/tucano/dndscv_docker

With the docker container and double call we can run tests both on HPC and OSX laptops:

container "${workflow.containerEngine == 'singularity' ? 'docker://tucano/dndscv:latest' : 'tucano/dndscv:latest'}"

My current interface for DNDSCV subworkflow:

  1. BUILD_REFERENCE in workflow when I pass cda and genome
  2. Skip BUILD_REFERENCE using a prebuilt Custom RefCDS passed as rda when dndscv_refcds_rda is not null

TODO

  1. Add globaldnds to the rds object
  2. Params and interface
  3. More testing
  4. Minidataset for integration testing (BUILD_REFERENCE and then DNDSCV)

@tucano
Copy link
Member

tucano commented Aug 7, 2024

STATUS 7 AGO 2024

Added DRIVER_ANNOTATION with basic nf-tests
Added params for significance limits to mark mutations as potentially driver
Created a docker container for dndscv: https://hub.docker.com/r/tucano/dndscv

@tucano
Copy link
Member

tucano commented Sep 4, 2024

dndscv updates, test with SCOUT_SPN01 files.

using hg19_hg38 covariates (covariates_hg19_hg38_epigenome_pcawg.rda)

The pileup_VCF SCOUT_SPN01_SS_SPN01_Sample_1_pileup_VCF.rds still failing with

11 (18%) mutations have a wrong reference base. Please confirm that you are not running data from a different assembly or species.

This limit (>10%) is hardcoded in dndscv.

Changes TODO in the dndscv nextflow process:

  1. param to set covariates reference file
  2. support for multiple samples in the RDS input file (new file have normal and tumor)

Proposal on how to use dndscv to annotate drivers

We run dndscv on ALL samples in a single step (may be divided by timepoint/categories) from this we infer the DRIVER mutations and genes with a good statistical power. We then use this COHORT dnds to estimate:

  • The global dnds of the cohort
  • The potential driver genes in the cohort

Example:

we run dndscv on all samples and we get positive selection with gene MSH6 as a potential driver with 6 non-synonymous mutations and 2 synonymous mutations (in the whole cohort). Now for each mutation in each sample, we add the "potential driver annotation" column using the global dnds information and stats.

@tucano
Copy link
Member

tucano commented Sep 12, 2024

Ok, I will try to implement multi samples calls in dndscv module

@tucano tucano added the enhancement New feature or request label Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants