Skip to content
/ filtFst Public

filtFst calculates Fst, a measure of population differentiation due to genetic structure

License

Notifications You must be signed in to change notification settings

mpcox/filtFst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

filtFst

The C program filtFst calculates FST, a measure of population differentiation due to genetic structure, on datasets containing genetic data for two populations formatted in a similar style to the output of Richard Hudson's simulation software, ms.

filtFst was originally written by Jeff Wall, and later modified by August Woerner and Murray Cox.

filtFst implements the variant of FST described in:

Hudson, R.R., M. Slatkin and W.P. Maddison. 1992. Estimation of levels of gene flow from DNA sequence data. Genetics 132:583-589.

Specifically, equation 3:

FST⟩ = 1 − (Hw / Hb)

where Hw is the mean pairwise diversity within populations and Hb is the mean pairwise diversity between populations. The implementation here includes a correction factor for unequal sample sizes, as described in the supplementary material of:

Plagnol, V. and J.D. Wall. 2006. Possible ancestral structure in human populations. PLoS Genetics 2:e105.

filtFst can accommodate:

  • Simulated data in a slight modification of Richard Hudson's ms format; and
  • Real data in the same format (with missing data encoded as '?').

COMPILATION

To compile filtFst, install any standard C compiler. For gcc, type:

gcc filtFst.c -o filtFst -O3 -lm -Wall

USAGE

The basic usage of the program is:

filtFst n1 n2

where

n1 is the sample size of population 1 (i.e., the number of sequences)
n2 is the sample size of population 2 (i.e., the number of sequences)

The data file itself is read from stdin (see EXAMPLES section below).

INPUT FORMAT

The input format is similar to the output format of Richard Hudson's ms program, described in:

Hudson, R.R. 2002. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18:337-338.

A similar format is used by Gary Chen's genome-scale Markovian coalescent simulator, MaCS, described in:

Chen, G.K., P. Marjoram and J.D. Wall. 2009. Fast and flexible simulation of DNA sequence data. Genome Research 19:136-142.

The first line lists the number of segregating sites. The second line lists the positions of each single nucleotide polymorphism. (These are not used in the FST calculation, but one value per segregating site must be present in the data file). Each following line gives a single haplotype with two alleles represented by '0' and '1' and missing data represented by '?'. Only two character states (excluding the missing data character) are allowed per site and polymorphisms must be point mutations (i.e., the program does not accommodate indels). The first set of n1 sequences should come from population one, while the last set of n2 sequences should come from population two.

42
0.176460 0.176470 0.178610 0.179200 0.183260 0.184560 ...
000000000000000100000000000000000000000000
110000100010100101010100011100001010011111
??0000000000000100000000011000000110000000
??0110000000000110100000011000000110000000
000000000000000000000000011000000010000000
000110000000010100100001011000000110000000
??????????????????????????????????????????
??????????????????????????????????????????
000000000000000100000000001000000000000000
000000000010101100010000011000000110000010
.
.
.

The FST value is returned to stdout.

EXAMPLES

The example files example_data1.dat and example_data2.dat represent real genomic data for the human loci 1pMB4 and 4qMB105 in French and Mandenka, respectively.

To calculate FST between French and Mandenka at 1pMB4:

cat example_data1.dat | filtFst 32 34
$ 0.086995

To calculate FST between French and Mandenka at 4qMB105:

cat example_data2.dat | filtFst 32 34
$ 0.139620

About

filtFst calculates Fst, a measure of population differentiation due to genetic structure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages