as-eGRM

as-eGRM is a genealogy-based method to estimate the expected genetic relationship matrix within one ancestry in admixed populations.

Please see our manuscript for more details.

Installation

Firstly execute the following commands to use conda to create an environment for running asegrm

conda create --name asegrm python==3.9.18 numpy==1.26.2 bcftools==1.15.1
conda activate asegrm

Then go into the asegrm folder and execute the following command to install asegrm and the dependencies

pip install .

Usage

To enable running parallel jobs under diverse computation environments, the running is split into two steps: compute and merge. Please first run the compute step for each chromosome/chunk with parallelization, then run the merge step to merge the output of the compute step to generate the final ancestry-specific GRM.

Running compute step

asegrm compute [-h] --trees TREES --leaf_ids LEAF_IDS --local_ancestry LOCAL_ANCESTRY --target_ancestry TARGET_ANCESTRY --genetic_map GENETIC_MAP --output_path OUTPUT_PATH [--gp GP] [--left LEFT] [--right RIGHT] [--rlim RLIM] [--alim ALIM] [--verbose]

Required arguments

--trees: Path to tskit tree sequence file of one chr/chunk
--leaf_ids: Path to the file of the IDs of the leaves on the trees. No header, one ID per line. Ensure the order of the IDs is the same as the leaves'. For some ARG-reconstruction software packages like Relate, SINGER, tsinfer-tsdate, the order of the leaves on the output trees is the same as the order of the samples in the input VCF file. For these packages, you can read the sample IDs from the VCF, keep the same order, and append .0 and .1 to each sample ID (because each leaf represents a haplotype) to get the IDs. Ensure the order of the IDs is the same across chrs/chunks. Ensure the IDs are included in the file indexed by the --local_ancestry.
--local_ancetry: Local ancestry calls of the same chr as the tree sequence. Currently support the .msp.tsv format from RFMix and the .vcf.gz format from flare. When the tree sequence is a chunk on a chromosome, it can be the same whole chromosome as long as it covers the tree sequence region.
--target_ancestry: Population name of the ancestry being targeted for investigation. All the leaves with other ancestries in the trees are masked when constructing ancestry-specific GRM. Ensure the name is included in the file indexed by the --local_ancestry.
--genetic_map: Path to the genetic map file, which is a (comma/space/tab separated) three-column file with the first column specifying the physical position in bp and the third column specifying the genetic position in cM. The first line will always be ignored as the header.
--output_path: Path to output directory

Optional arguments

--gp: The function up-weighting recent branches. Currently provide two options: gp1 and gp2. Compared to gp1, gp2 up-weights recent branches more heavily, potentially being able to reveal more recent/finer structures.
--rlim and --alim: Most recent time limit and most ancient time limit. Unit: generations. The two parameters are used to define a time window, only the part within the window of the trees is used when asegrm is constructing the GRM, potentially being able to reveal the structures that happened in the time window. See the section Time-specific eGRM reveals dynamic relatedness through history in Fan et al., 2022 for more info.
--left and --right: Leftmost genomic position and rightmost genomic position. Unit: bp. The two parameters are used to define a genomic interval, only the part within the interval of the trees is used when asegrm is constructing the GRM.

Output

Under the output directory, the files with the suffixs below are generated.

.trees.target_ancestry.asegrm.npy and .trees.target_ancestry.asegrm.mu.npy: Intermediate files used when running the merge step to generate the final ancestry-specific GRM.
.trees.target_ancestry.asegrm.log: Log file
.trees.lac.npy: Ancestry info for the leaves in the trees. This file just needs to be generated once when running asegrm multiple times, each with a different target ancestry.
.trees.lac.skipped_trees: The trees with leaves spanning more than one ancestry are skipped when generating the .trees.lac.npy file and have no contribution to constructing ancestry-specific GRM. This file records the proportion of these trees.

Running merge step

asegrm merge [-h] output_path

Required arguments

output_path: The path indexed by the --output_path when running the compute step.

Output

The following files are generated under the directory indexed by the --output_path.

merged.target_ancestry.asegrm.diploid.npy: The ancestry-specific GRM with diploid mode
merged.target_ancestry.asegrm.haploid.npy: The ancestry-specific GRM with haploid mode
merged.target_ancestry.asegrm.log: Log file

Example

Check ./example/example.py for an example

Support

Any issues please contact Ji Tang ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
example		example
manuscript		manuscript
src		src
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

as-eGRM

Installation

Usage

Running compute step

Required arguments

Optional arguments

Output

Running merge step

Required arguments

Output

Example

Support

About

Releases

Packages

Languages

License

jitang-github/asegrm

Folders and files

Latest commit

History

Repository files navigation

as-eGRM

Installation

Usage

Running compute step

Required arguments

Optional arguments

Output

Running merge step

Required arguments

Output

Example

Support

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages