Skip to content

Installation

Gian M. Franceschini edited this page Jun 20, 2025 · 18 revisions

Follow the following steps to set up the environment for running HaploC tools

  1. Clone the repository: git clone https://github.com/CSOgroup/HaploC-tools.git
  2. Download data files required by HaploC-tools (1000 Genomes reference panel haplotypes, genomic reference data) and put (or link) it under HaploC-tools. The data files can be downloaded from zenodo.
# Within the HaploC-tools folder, run:

wget --content-disposition https://zenodo.org/records/10446020/files/genomicData.tar.gz?download=1
tar -xvzf genomicData.tar.gz

You can also get a demo data folder to test HaploC-tools with:

wget --content-disposition https://zenodo.org/records/10446020/files/demo_data.tar.gz?download=1
tar -xvzf demo_data.tar.gz

This can be saved outside the HaploC-tools folder and be used in the following steps to run an end-to-end analysis.

  1. Create conda environment with specified dependencies. A working conda installation is necessary for HaploC-tools.
conda env create -f HaploC-tools/environments/env_HapCUT2.yml # environment required for HapCUT2-based operations
conda env create -f HaploC-tools/environments/env_nHapCUT2.yml # environment required for all other operations

To fasten up the procedure, you can consider using mamba, a drop-in replacement for conda. Notice that in some cases, it may be necessary to set --channel-priority flexible to ensure a correct environment installation, depending on the channel priority in your configuration.

  1. Install required R packages under the nHapCUT2 environment:
  • R.utils (>= 2.9.0),
  • doParallel (>= 1.0.15),
  • ape (>= 5.3),
  • dendextend (>= 1.12.0),
  • fitdistrplus (>= 1.0.14),
  • igraph (>= 1.2.4.1),
  • Matrix (>= 1.2.17),
  • rARPACK (>= 0.11.0),
  • factoextra (>= 1.0.5),
  • data.table (>= 1.12.2),
  • fields (>= 9.8.3),
  • GenomicRanges (>= 1.36.0)
  • ggplot2 (>= 3.3.5)
  • strawr (>= 0.0.9)
  • CALDER (custom version, within the folder)

All R packages can be installed and checked with the following steps:

First, activate the nHapCUT2 env.

conda activate nHapCUT2

Next, execute the following:

conda install -c conda-forge r-base r-essentials r-igraph

Next, you can install the remaining scripts and verify that all the dependencies are met.

install_if_needed <- function(package) {
    if (!requireNamespace(package, quietly = TRUE)) {
        install.packages(package, dependencies = TRUE)
    }
}

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

cran_packages <- c(
    "R.utils", "doParallel", "ape", "dendextend", "fitdistrplus",
    "Matrix", "rARPACK", "factoextra", "data.table", "fields", 
    "ggplot2", "strawr"
)

bioc_packages <- c("GenomicRanges")

# Install CRAN
for (pkg in cran_packages) {
    install_if_needed(pkg)
}

# Install Bioconductor
for (pkg in bioc_packages) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
        BiocManager::install(pkg, ask = FALSE)
    }
}

# Install CALDER from source
install.packages("./HaploC-tools/CALDER2/", repos = NULL, type = "source")

# Check installed packages
pkgs <- c(cran_packages, bioc_packages, "CALDER")

results <- sapply(pkgs, function(pkg) {
    if (!requireNamespace(pkg, quietly = TRUE)) {
        message(sprintf("Package '%s' is missing.", pkg))
        return(FALSE)
    } else {
        message(sprintf("Package '%s' is installed.", pkg))
        return(TRUE)
    }
})

if (all(results)) {
    message("All packages successfully installed and loaded.")
} else {
    message("Some packages are missing or failed to load.")
}

Please ensure you are in the directory containing HaploC-tools to install CALDER successfully based on its path. If all packages are installed and working properly, you should get the All packages correctly installed! message, and you can proceed.

In some cases, the installation of certain R packages might fail using the R console due to missing system library dependencies. In those cases, installation can usually be completed with conda (ex: conda install r-mass).

HaploC-tools has been tested on Red Hat 9.4 and AlmaLinux 9.3. The installation procedure typically takes less than 30 minutes, depending on the environment manager used (conda, mamba, micromamba, etc...).

Next steps

Clone this wiki locally