-
Notifications
You must be signed in to change notification settings - Fork 0
Installation
- Clone the repository:
git clone https://github.com/CSOgroup/HaploC-tools.git
- Download data files required by
HaploC-tools
(1000 Genomes reference panel haplotypes, genomic reference data) and put (or link) it underHaploC-tools
. The data files can be downloaded from zenodo.
# Within the HaploC-tools folder, run:
wget --content-disposition https://zenodo.org/records/10446020/files/genomicData.tar.gz?download=1
tar -xvzf genomicData.tar.gz
You can also get a demo data folder to test HaploC-tools
with:
wget --content-disposition https://zenodo.org/records/10446020/files/demo_data.tar.gz?download=1
tar -xvzf demo_data.tar.gz
This can be saved outside the HaploC-tools
folder and be used in the following steps to run an end-to-end analysis.
- Create conda environment with specified dependencies. A working
conda
installation is necessary for HaploC-tools.
conda env create -f HaploC-tools/environments/env_HapCUT2.yml # environment required for HapCUT2-based operations
conda env create -f HaploC-tools/environments/env_nHapCUT2.yml # environment required for all other operations
To fasten up the procedure, you can consider using mamba, a drop-in replacement for conda
. Notice that in some cases, it may be necessary to set --channel-priority flexible
to ensure a correct environment installation, depending on the channel priority in your configuration.
- Install required R packages under the nHapCUT2 environment:
- R.utils (>= 2.9.0),
- doParallel (>= 1.0.15),
- ape (>= 5.3),
- dendextend (>= 1.12.0),
- fitdistrplus (>= 1.0.14),
- igraph (>= 1.2.4.1),
- Matrix (>= 1.2.17),
- rARPACK (>= 0.11.0),
- factoextra (>= 1.0.5),
- data.table (>= 1.12.2),
- fields (>= 9.8.3),
- GenomicRanges (>= 1.36.0)
- ggplot2 (>= 3.3.5)
- strawr (>= 0.0.9)
- CALDER (custom version, within the folder)
All R packages can be installed and checked with the following steps:
First, activate the nHapCUT2
env.
conda activate nHapCUT2
Next, execute the following:
conda install -c conda-forge r-base r-essentials r-igraph
Next, you can install the remaining scripts and verify that all the dependencies are met.
install_if_needed <- function(package) {
if (!requireNamespace(package, quietly = TRUE)) {
install.packages(package, dependencies = TRUE)
}
}
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
cran_packages <- c(
"R.utils", "doParallel", "ape", "dendextend", "fitdistrplus",
"Matrix", "rARPACK", "factoextra", "data.table", "fields",
"ggplot2", "strawr"
)
bioc_packages <- c("GenomicRanges")
# Install CRAN
for (pkg in cran_packages) {
install_if_needed(pkg)
}
# Install Bioconductor
for (pkg in bioc_packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
BiocManager::install(pkg, ask = FALSE)
}
}
# Install CALDER from source
install.packages("./HaploC-tools/CALDER2/", repos = NULL, type = "source")
# Check installed packages
pkgs <- c(cran_packages, bioc_packages, "CALDER")
results <- sapply(pkgs, function(pkg) {
if (!requireNamespace(pkg, quietly = TRUE)) {
message(sprintf("Package '%s' is missing.", pkg))
return(FALSE)
} else {
message(sprintf("Package '%s' is installed.", pkg))
return(TRUE)
}
})
if (all(results)) {
message("All packages successfully installed and loaded.")
} else {
message("Some packages are missing or failed to load.")
}
Please ensure you are in the directory containing HaploC-tools
to install CALDER
successfully based on its path.
If all packages are installed and working properly, you should get the All packages correctly installed!
message, and you can proceed.
In some cases, the installation of certain R packages might fail using the R console due to missing system library dependencies.
In those cases, installation can usually be completed with conda (ex: conda install r-mass
).
HaploC-tools has been tested on Red Hat 9.4 and AlmaLinux 9.3. The installation procedure typically takes less than 30 minutes, depending on the environment manager used (conda
, mamba
, micromamba
, etc...).