Pipeline overview

Starting with fastQ files as input, this pipeline has the following stages:

  1. Pre-alignment QC with fastp (adapter-trimming + QC) and multiQC report collation
  2. Alignment with STAR
  3. Post-alignment QC and analysis with:
  • Infer library strandedness with RSeQC
  • Outputs Transcript Integrity number (TIN) with RSeQC for evenness of coverage assessment
  • Outputs wig and normalised bigwig files for genome browser visualisations with RSeQC bam2wig and UCSC wig2BigWig binary
  1. Quantification with FeatureCounts from Subread at the exon-level, grouped by gene_id into metafeatures.
  2. [Conditional] If there are >=2 sample conditions and >=3 samples, we carry out the R secondary analysis which performs:
  • Differential gene expression across all pairwise comparisons
  • Overrepresentation analysis for pathway enrichment


Name Type Extension Description
fastpQcHtml pre-alignment qc .html files fastp html reports
multiqcHtml pre-alignment qc .html file multiqc html report
bamFiles alignment results .bam files STAR sorted-by-coordinate bams
starLogs post-alignment qc log files STAR files
flagstat post-alignment qc log files flagstat files
tin_summary post-alignment qc .txt files rseqc tin summary files
tin_xls post-alignment qc .xls files rseqc tin xls files
wigs alignment results .wig files wig files
normalised bigwigs alignment results .bigwig files bigwig files
countMatrix quantification results .txt featureCounts count matrix
countsParsed quantification results .txt Output of which provides gene names
countsSummary quantification results .txt featureCounts summary file
downstreamResDir R secondary analysis results .zip R analysis results files

Docker images

  • dockerBase =
  • dockerPrefix = species + genomeVersion
    • E.g. “human” + “grch38”
  • dockerUri = dockerBase + dockerPrefix

Modules and images

For other species, replace docker image according to naming convention

Index Module Docker image name Scripts/apps
1 countArrayUniqueItems.wdl ubuntu None
2 fastp.wdl qc (apps_refless) None
3 multiQc.wdl qc (apps_refless) None
4 pairsToR1R2.wdl None None
5 star.wdl humangrch38rnaseq (fat Docker) (ref_files/species): STAR indices, annotation.gtf
6 samtools.wdl samtools (apps_refless) None
7 rseqc.wdl humangrch38rseqc (apps) (ref_files/species): annotation.bed, houseKeepingGenes.bed, chrNameLength.txt; (ref_files/apps): wigToBigWig; (apps):
8 featureCounts.wdl humangrch38featureCounts (apps) (ref_files/species): annotation.gtf; (apps)
9 downstreamRNAseq.wdl downstreamRNAseq (apps_refless) (apps): main.R

Gitlab repositories

  • Main: RNAseq repository
  • Dockerfiles and configurations
  • Modules (checkout rnaseq branch)

Biodebian organisation

Submitting jobs to the cromwell server on biodebian

The options.json file is a dummy file required for running the cromshell submit subcommand. Currently, cromwell has the following IP:

cromshell submit ./main.wdl ./tests/tests_biodebian/rnaseq_mouse_grcm39.json ./options.json ./

