Skip to content

weiszd/SegAlign

This branch is 11 commits ahead of gsneha26/SegAlign:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

f2840fb · Oct 11, 2023
Aug 9, 2022
Jul 17, 2022
Aug 9, 2022
Oct 11, 2023
Sep 19, 2021
Oct 11, 2023
Jan 16, 2021
Jan 16, 2021
Jan 19, 2022
Oct 11, 2023
Jun 6, 2020
Oct 1, 2021
Jan 16, 2021

Repository files navigation

License Build Status Published in SC20

A Scalable GPU System for Pairwise Whole Genome Alignments based on LASTZ's seed-filter-extend paradigm.

Table of Contents

Overview

The system has been tested on all the AWS G3 and P3 GPU instances with AMI Ubuntu Server 18.04 LTS (HVM), SSD Volume Type (ami-0fc20dd1da406780b (64-bit x86))

git clone https://github.com/gsneha26/SegAlign.git
export PROJECT_DIR=$PWD/SegAlign

Dependencies

The following dependencies are required by SegAlign:

  • NVIDIA CUDA 10.2 toolkit
  • CMake 3.8
  • Intel TBB library
  • libboost-all-dev
  • parallel
  • zlib
  • LASTZ 1.04.15
  • faToTwoBit, twoBitToFa (from kentUtils)

The dependencies can be installed with the given script as follows, which might take a while (only installs the dependencies not present already). This script requires sudo to install most packages at the system level. Using the -c option skips CUDA installation [the CUDA toolkit binaries should be in $PATH for SegAlign].

cd $PROJECT_DIR
./scripts/installUbuntu.sh

How to run SegAlign

  • Run SegAlign
run_segalign target query [options]
  • For a list of options
run_segalign --help

Running a test

cd $PROJECT_DIR
mkdir test
cd test
wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/cb4/bigZips/cb4.2bit 
twoBitToFa ce11.2bit ce11.fa
twoBitToFa cb4.2bit cb4.fa
run_segalign ce11.fa cb4.fa --output=ce11.cb4.maf

How to run SegAlign repeat masker

  • Run SegAlign repeat masker
run_segalign_repeat_masker sequence [options]
  • For a list of options
run_segalign_repeat_masker --help

Running a test

cd $PROJECT_DIR
mkdir test_rm
cd test_rm
wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
twoBitToFa ce11.2bit ce11.fa
run_segalign_repeat_masker ce11.fa --output=ce11.seg

Running Docker Image

Running segalign

wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
wget https://hgdownload-test.gi.ucsc.edu/goldenPath/cb4/bigZips/cb4.2bit 
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/ce11.2bit \
                           /data/ce11.fa
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/cb4.2bit \
                           /data/cb4.fa
sudo docker run --ipc=host --gpus all -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           run_segalign \
                           /data/ce11.fa \
                           /data/cb4.fa \
                           --output=/data/ce11.cb4.maf

Running segalign_repeat_masker

wget https://hgdownload.soe.ucsc.edu/goldenPath/ce11/bigZips/ce11.2bit
sudo docker run -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           twoBitToFa \
                           /data/ce11.2bit \
                           /data/ce11.fa
sudo docker run --ipc=host --gpus all -v $(pwd):/data -it gsneha/segalign:v0.1.2 \
                           run_segalign_repeat_masker \
                           /data/ce11.fa \
                           --output=/data/ce11.seg

Citing SegAlign

S. Goenka, Y. Turakhia, B. Paten and M. Horowitz, "SegAlign: A Scalable GPU-Based Whole Genome Aligner," in 2020 SC20: International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Atlanta, GA, US, 2020 pp. 540-552. doi: 10.1109/SC41405.2020.00043

About

A Scalable GPU-Based Whole Genome Aligner, published in SC20: https://doi.ieeecomputersociety.org/10.1109/SC41405.2020.00043

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 47.7%
  • Cuda 38.9%
  • C 6.2%
  • Shell 5.2%
  • CMake 1.3%
  • Dockerfile 0.7%