Pipeline for processing paired-end RNA-sequencing using cgatcore and HISAT2.
- Create a new repository from this one, using the
Use as template
button on GitHub.- That way, your new repository starts its own commit history, where you can record your own changes!
- Only fork this repository if you wish to contribute updates to the template pipeline itself.
- Clone the new repository to the computer where you wish to run the pipeline.
- The clone is the working directory for one run of the pipeline on one set of FASTQ files.
- To run the pipeline on another set of FASTQ file, go back to step 1, and create another repository from the template.
- Create a Conda environment named
pipeline_rnaseq_hisat2
using the fileenvs/pipeline.yml
.- You only need to do this once, no matter how many times you run the pipeline and how many copies of the pipeline you have cloned.
- In doubt, remove the existing environment and create it again from this file.
- Create symbolic links to your input FASTQ files, in the subdirectory
data/
.- Do not copy the files themselves, or make sure you don't commit them to Git (e.g. use
.gitignore
).
- Do not copy the files themselves, or make sure you don't commit them to Git (e.g. use
- Edit the configuration of the pipeline as needed, in the file
config.yml
.- Commit your changes to the configuration for version control and traceability.
- Run the pipeline!
- On a High-Performance Computing (HPC) cluster,
python pipeline.py make full -v 5
, to use the Distributed Resource Management Application API (DRMAA). - On a local machine
python pipeline.py make full -v 5 --local
.
- On a High-Performance Computing (HPC) cluster,