Skip to content

Arrowhead

nchernia edited this page Feb 9, 2017 · 7 revisions

#Finding Contact Domains (Arrowhead)#

##Quick Description##

This is the usage that most users will likely use (more detailed usage below):

arrowhead <HiC file> <output_file>

Upon a successful run of Arrowhead, output_file will contain all the contact domains found along the diagonal in this format.

###Examples###

arrowhead local/folder/HIC006.hic local/folder/contact_domains_list

This command will run Arrowhead on HIC006 at resolution 5 kB or 10 kB (depending on the map's resolution) and save all contact domains to the contact_domains_list file.

arrowhead https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined_30.hic contact_domains_list

This command will run Arrowhead at resolution 5kB on the GM12878 HiC map (high resolution) and save all contact domains to the contact_domains_list file. Note: these are the settings used to generate the official GM12878 contact domain list.

Default parameters for arrowhead described below.

##Detailed Usage##

arrowhead [-c chromosome(s)] [-m matrix size] [-r resolution] 
		[-k normalization (NONE/VC/VC_SQRT/KR)] <HiC file> 
		<output_file> [feature_list] [control_list]

The required arguments are:

  • <HiC file>: Address of HiC file which should end with ".hic". This is the file you would load into Juicebox. URLs or local addresses may be used. Running Arrowhead on MAPQ>30 and MAPQ>0 files generally gives comprable results.
  • <output_file>: Final list of all contact domains found by Arrowhead. Can be visualized directly in Juicebox as a 2D annotation.

-- NOTE -- If you want to find scores for a feature and control list, both must be provided:

  • [feature_list]: Feature list of loops/domains for which block scores are to be calculated
  • [control_list]: Control list of loops/domains for which block scores are to be calculated

The optional arguments are:

  • -c <String(s)> Chromosome(s) on which Arrowhead will be run. The number/letter for the chromosome can be used with or without appending the "chr" string. Multiple chromosomes can be specified using commas (e.g. 1,chr2,X,chrY)
  • -m <int> Size of the sliding window along the diagonal in which contact domains will be found. Must be an even number as (m/2) is used as the increment for the sliding window. (Default 2000)
  • -r <int> resolution for which Arrowhead will be run. Generally, 5kB (5000) or 10kB (10000) resolution is used depending on the depth of sequencing in the HiC file(s).
  • -k <NONE/VC/VC_SQRT/KR> Normalizations (case sensitive) that can be selected. Generally, KR (Knight-Ruiz) balancing should be used when available.

###Defaults### Arrowhead uses the following parameters if optional flags are not provided.

Medium resolution maps:

-c (all chromosomes) 
-m 2000 
-r 10000 
-k KR

High resolution maps:

-c (all chromosomes) 
-m 2000 
-r 5000 
-k KR

###Domain List Content### The contact domain list created by Arrowhead will start with a header line, followed by a line for every contact domain. By default, the file should contain 12 fields per line in the following format:

chromosome1    x1    x2    chromosome2    y1    y2    color    
		corner_score    Uvar    Lvar    Usign    Lsign

Explanations of each field are as follows:

  • chromosome = the chromosome that the domain is located on
  • x1,x2/y1,y2 = the interval spanned by the domain (contact domains manifest as squares on the diagonal of a Hi-C matrix and as such: x1=y1, x2=y2)
  • color = the color that the feature will be rendered as if loaded in Juicebox
  • corner_score = the corner score, a score indicating the likelihood that a pixel is at the corner of a contact domain. Higher values indicate a greater likelihood of being at the corner of a domain
  • Uvar = the variance of the upper triangle
  • Lvar = the variance of the lower triangle
  • Usign = -1*(sum of the sign of the entries in the upper triangle)
  • Lsign = sum of the sign of the entries in the lower triangle

See Section IV.a.3 of the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014 for more details.

Clone this wiki locally