Automated monitoring and logging of sequencing runs performed on the GridION sequencer.
The dominION agent supervises all channels of a GridION simultaneously, performing demultiplexing for barcoded libraries, data transfer to a remote server, as well as logging of experiment parameters, QC results and read statistics. For each experiment, dominION produces and updates comprehensive reports in printer-friendly html format, comprising tabular information and data plots about G+C content, read length, read quality, and throughput for the complete run as well as every barcode adapter group. In addition, it enables to detect long-term trends of monitored features over the course of all documented sequencing runs and to detect regularities and interrelations of results between different samples and across different GridION sequencers.
Info : If you have no access to Github from the GridION itself, perform steps 1 and 2 on a different Linux machine and transfer the cloned directory dominION to the home directory of your GridION (/homes/grid/). Then continue with step 3 on the GridION itself. Please do not download the repository from your browser as this will lead to missing files.
- Open a console and make sure that git is installed:
sudo apt-get -y install git
- Clone the dominION git repository of dominION with option
--recurse-submodules
:
git clone --recurse-submodules https://github.com/MarkusHaak/dominION.git
- Run the setup script located in the directory dominION/script/setup :
Info : Please replace USER, HOST and DEST with the information specific to the server to which the sequencing data shall be transmitted. HOST is either the remote server's ip address or, more commonly, its domain hostname. USER needs to be replaced with a username on that remote server, and DEST is the full path of a directory on the remote server where the transmitted files and folders shall be saved.
/bin/bash dominION/script/setup -u USER -H HOST -d DEST/FOR/FILE/TRANSFER
This will install dominION in a new virtual environment, setup key authentication and defaults for sequence data transfer to a remote server. Unless the option -m
for minimal is set, a cron job for the dominion agent script is installed, the Firefox startup page is set to the overview page of dominION and .bash_aliases is modified to source the python virtual environment when opening a new console.
Info : Please confirm updates and authenticity requests by entering
y
oryes
respectively. You will be prompted to enter the administrator password of your local machine (the GridION) and the password for the specified user on the remote server to setup key authentication. Note that you will not be able to see any characters while entering the passwords.
- Shutdown and subsequent restart
Shutdown the GridION sequencer. After approximately 10 seconds, restart it by pressing the power button. The dominION agent script should now be running in the background. Unless the option -m
was set during installation, the overview page of dominION should now be the startup page of Firefox.
Info : The steps in this section are not necessary if the Quick Setup was performed.
On a brand new gridION, the software is not up-to-date. In any case, consider running apt update and apt upgrade as admin first:
sudo apt update
sudo apt upgrade
For file transfer from the GridION to a remote server, it is required to configure SSH key-based authentication. On the GridION, generate a SSH key pair for dominion.
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa_dominion
Adapt the public key in order to restricted key authentication to the rsync command needed for file transfer. Please change USER, HOST and DEST/ON/SERVER/ according to your needs. These parameters specify the destination of file transfers with rsync, as in USER@HOST:/DEST/ON/SERVER/.
user="USER"
host="HOST"
dest="/DEST/ON/SERVER/"
localip=$(ifconfig | sed -En 's/127.0.0.1//;s/.*inet (addr:)?(([0-9]*\.){3}[0-9]*).*/\2/p')
echo 'command="rsync --server -Rruve.iLsfx . '"$dest"'",from="'"$localip"'",restrict '"$(cat ~/.ssh/id_rsa_dominion.pub)" > ~/.ssh/id_rsa_dominion.pub
Then transfer the public key to the server using ssh-copy-id. You will be prompted to type in the password for the specified user on the remote host.
ssh-copy-id -i ~/.ssh/id_rsa_dominion.pub "${user}@${host}"
On some GridIONs, the Python3 installation is missing the Python package installer pip and git. You can install both with apt-get.
sudo apt-get -y install python3-pip git
As this software is intended to be run on the GridION sequencer, I highly recommend using virtualenv to set up a virtual python environment prior to the installation:
sudo apt-get -y install virtualenv
virtualenv -p python3 ~/.dominION
Don't forget to activate your virtual environment:
source ~/.dominION/bin/activate
This needs to be done every time you open a new console in which you want to execute dominION commands. I therefore recommend to add the source command to your .bash_aliases file. This way, the virtual environment is sourced automatically when opening a new console.
touch ~/.bash_aliases
echo "if [ -f ~/.dominION/bin/activate ]; then . ~/.dominION/bin/activate; fi" >> ~/.bash_aliases
dominION requires an adapted version of ont_fast5_api, which contains a script multi_to_multi_fast5 that splits Multi-Fast5 files into files containing reads belonging to the same adapter group. The same applies to Porechop, where I fixed a bug regarding the identification of adapter orientation. Both are included as submodules in the dominION repository on github. To install these dependencies, clone the dominION repository with option --recurse-submodules and install them separately:
git clone --recurse-submodules https://github.com/MarkusHaak/dominION.git
cd dominION/ont_fast5_api
python3 setup.py install
cd ../Porechop
python3 setup.py install
cd ..
In addition, the following external python modules are required, but they are automatically installed if you follow the instructions given under Installation.
- watchdog
- numpy
- pandas
- matplotlib
Please be aware that dominION requires python3.5 or greater and is not backwards compatible with python2.
At last, clone and install dominION. If you followed the steps above in the same console, dominION will be configured to use the user, host and destination as specified for setting up key authentication. Otherwise, you will be prompted to give these information when executing python3 setup.py install
.
INIFILE="dominion/resources/defaults.ini"
perl -pi -e "s|user.*|user = ${user}|" "$INIFILE"
perl -pi -e "s|host.*|host = ${host}|" "$INIFILE"
perl -pi -e "s|dest.*|dest = ${dest}|" "$INIFILE"
python3 setup.py install
As dominION is intended to run in the background as a software agent, i recommend adding a new cron job to your crontab that runs dominion in a screen shell.
newjob="@reboot screen -dm bash -c '. ${HOME}/.dominION/bin/activate ; dominion'"
(crontab -l ; echo "$newjob") | crontab -
To prevent the need to activate the virtual environment each time a subscript of dominION is needed, modify the .bash_aliases file to source it whenever a new console is opened.
touch ~/.bash_aliases
echo "if [ -f ~/.dominION/bin/activate ]; then . ~/.dominION/bin/activate; fi" >> ~/.bash_aliases
Optionally, you can change the startup page of Firefox to the dominION overview html file /data/dominION/HOSTNAME_overview.html .
DominION is intended to be run as a background process. If the quick setup was performed or the steps under recommended configuration were followed, dominion will start automatically as a cron job in a screen session on system startup. The command line options can be modified by editing the crontab list with crontab -e
.
As an alternative to installing a cron job, dominion is started in a console after activating the python virtual environment:
. ~/.dominion/bin/activate
dominion
All available command line options are listed and described in the section Command Line Options. Once started, no manual interaction with the dominion agent is necessary.
Info : Please make sure that dominION is running before starting a sequencing experiment.
DominION automatically detects platform qc and sequencing experiments started in MinKNOW and updates the overview html page accordingly. The default path for all collected information and reports is /data/dominion/
.
usage: dominion [-n] [-a] [-p] [-l MIN_LENGTH] [-r MIN_LENGTH_RNA]
[-q MIN_QUALITY] [-d RSYNC_DEST] [-i IDENTITY_FILE]
[--bc_kws [BC_KWS [BC_KWS ...]]] [-u UPDATE_INTERVAL] [-m]
[-o OUTPUT_DIR] [--data_basedir DATA_BASEDIR]
[--minknow_log_basedir MINKNOW_LOG_BASEDIR]
[--logfile LOGFILE] [--statsparser_args STATSPARSER_ARGS] [-h]
[--version] [-v] [--quiet]
A tool for monitoring and protocoling sequencing runs performed on the Oxford
Nanopore Technologies GridION sequencer and for automated post processing and
transmission of generated data. It collects information on QC and sequencing
experiments and displays summaries of mounted flow cells as well as
comprehensive reports about currently running and previously performed
experiments.
General arguments:
arguments for advanced control of the program's behavior
-n, --no_transfer no data transfer to the remote host (default: False)
-a, --all_fast5 also put fast5 files of reads removed by length and
quality filtering into barcode bins (default: False)
-p, --pass_only use data from fastq_pass only (default: False)
-l MIN_LENGTH, --min_length MIN_LENGTH
minimal length to pass filter (default: 1000)
-r MIN_LENGTH_RNA, --min_length_rna MIN_LENGTH_RNA
minimal length to pass filter for rna libraries
(default: 50)
-q MIN_QUALITY, --min_quality MIN_QUALITY
minimal quality to pass filter (default: 5)
-d RSYNC_DEST, --rsync_dest RSYNC_DEST
destination for data transfer with rsync, format
USER@HOST[:DEST]. Key authentication for the specified
destination must be set up (see option -i), otherwise
data transfer will fail. Default value is parsed from
setting file /home/grid/.dominION/lib/python3.5/site-p
ackages/dominion-0.4.2-py3.5.egg/dominion/resources/de
faults.ini (default: [email protected]
bielefeld.de:/vol/nanopore/GridION_Runs/)
-i IDENTITY_FILE, --identity_file IDENTITY_FILE
file from which the identity (private key) for public
key authentication is read. Default value is parsed
from setting file
/home/grid/.dominION/lib/python3.5/site-packages/domin
ion-0.4.2-py3.5.egg/dominion/resources/defaults.ini
(default: /home/grid/.ssh/id_dominion_85285851bbb872d7
3158c65e9478f3bef61eb917)
--bc_kws [BC_KWS [BC_KWS ...]]
if at least one of these key words is a substring of
the run name, porechop is used to demultiplex the
fastq data (default: ['RBK', 'NBD', 'RAB', 'LWB',
'PBK', 'RPB', 'arcod'])
-u UPDATE_INTERVAL, --update_interval UPDATE_INTERVAL
minimum time interval in seconds for updating the
content of a report page (default: 300)
-m, --ignore_file_modifications
Ignore file modifications and only consider file
creations regarding determination of the latest log
files (default: False)
I/O arguments:
Further input/output arguments. Only for special use cases
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Path to the base directory where experiment reports
shall be saved (default: /data/dominION/)
--data_basedir DATA_BASEDIR
Path to the directory where basecalled data is saved
(default: /data)
--minknow_log_basedir MINKNOW_LOG_BASEDIR
Path to the base directory of GridIONs log files
(default: /var/log/MinKNOW)
--logfile LOGFILE File in which logs will be safed (default:
OUTPUTDIR/logs/YYYY-MM-DD_hh:mm_HOSTNAME_LOGLVL.log
Statsparser arguments:
Arguments passed to statsparser for formatting html reports
--statsparser_args STATSPARSER_ARGS
Arguments that are passed to the statsparser script.
See a full list of available arguments with
--statsparser_args " -h" (default: [])
Help:
-h, --help Show this help message and exit
--version Show program's version string and exit
-v, --verbose Additional debug messages are printed to stdout
(default: False)
--quiet Only errors and warnings are printed to stdout
(default: False)
usage: statsparser [-r] [--html_refresh_rate HTML_REFRESH_RATE]
[--max_bins MAX_BINS] [--time_intervals TIME_INTERVALS]
[--kb_intervals KB_INTERVALS] [--gc_interval GC_INTERVAL]
[--matplotlib_style MATPLOTLIB_STYLE] [--dpi DPI]
[--width WIDTH] [--height HEIGHT] [-h] [--version] [-v]
[-q]
input
Parses a csv file containing statistics about a nanopore sequencing run and
creates an in-depth report file including informative plots.
Main options:
input Stats file containing read information or a directory
containing several such files. Requires CSV files with
" " as seperator, no header and the following columns
in given order: read_id, length, qscore, mean_gc,
Passed/tooShort, read_number, pore_index, timestamp,
barcode
-r, --recursive recursively search for directories containing stats
files and corresponding logdata files (default: False)
--html_refresh_rate HTML_REFRESH_RATE
refresh rate of the html page in seconds (default:
120)
Plotting options:
Arguments changing the appearance of plots
--max_bins MAX_BINS maximum number of bins for box plots (default: 24)
--time_intervals TIME_INTERVALS
time intervals in minutes available for binning
(default: [1, 2, 5, 10, 20, 30, 60, 90, 120, 240])
--kb_intervals KB_INTERVALS
kb intervals available for binning (default: [0.5,
1.0, 2.0, 5.0])
--gc_interval GC_INTERVAL
gc interval for binning reads based on mean G+C
content (default: 0.5)
--matplotlib_style MATPLOTLIB_STYLE
matplotlib style string that influences all colors and
plot appearances (default: default)
--dpi DPI
--width WIDTH width of figure in inches (default: 6.4)
--height HEIGHT height of figure in inches (default: 4.8)
Help:
-h, --help Show this help message and exit
--version Show program's version number and exit
-v, --verbose Additional status information is printed to stdout
(default: False)
-q, --quiet No prints to stdout (default: False)