-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I would like to suggest updating the Xerxes documentation so it is superexplicit about how F3 is implemented.
The problem is that F3(a,b,c) means different things in different software. In Xerxes it means F3(a,b;c), ie (c-a)(c-b), but in some other programs it actually means F3(a;b,c), ie (a-b)(a-c).
So it is very important to make it very clear.
In Xerxes documentation it is explicitly expressed in F3vanilla description:
"F3vanilla(a,b,c): F3-Statistics - Vanilla version, recommended if used as Outgroup-F3 statistics or with group c being pseudo-haploid: Are computed as F3(a,b,c) = (c-a)(c-b) across all SNPs.",
but it could use some more prominent representation.
And then an example that is provided for F3 is:
--stat "F3(<Chimp.REF>, <Altai_snpAD.DG>, Spanish)"
This is quite confusing as in the two usages of F3 that I know of it would mean:
a) Spanish is treated as an outgroup to Chimp and Altai (outgroup F3)
b) were testing whether Spanish can be modelled as admixture between Chimp and Altai (admixture F3)
I found this example confusing as in making me doubt which population is indeed treated as "the other" by Xerxes.
I suggest:
a) changing the confusing top example "--stat "F3(<Chimp.REF>, <Altai_snpAD.DG>, Spanish)"" to "--stat "F3(French, Spanish, <Chimp.REF>)"" as it appears in some later example provided
b) considering adding a statement that in Xerxes F3(a,b,c) means F3(a,b;c). Not sure where a right spot would be... Perhaps in "--stat ARG " help definitions
BTW, is this notation (from the Xerxes documentation) correct?:
F3(a,b,c) = F3vanilla(a,b) - hC/sC
I am no matematician but it looks weird to me. As if F3 is computed for two pops and then the "c" stuff is subtracted. Or is that the case somehow?
Also would be nice if "h" and "s" were explained. Also also, is there any particular reason why c is capitalized in this equation?