Skip to content

A framework for analyzing biological data via graph construction, clustering, and embedding generation. The resulting embeddings power downstream tasks like disease prediction, subject representation, and biomarker discovery.

License

Notifications You must be signed in to change notification settings

UCD-BDLab/BioNeuralNet

Repository files navigation

BioNeuralNet: A Graph Neural Network based Multi-Omics Network Data Analysis Tool

License PyPI GitHub Issues GitHub Contributors Downloads Documentation

Welcome to BioNeuralNet 1.0.9

BioNeuralNet Logo

BioNeuralNet is a Python framework for integrating and analyzing multi-omics data using Graph Neural Networks (GNNs). It provides tools for network construction, embedding generation, clustering, and disease prediction, all within a modular, scalable, and reproducible pipeline.

BioNeuralNet Workflow

Documentation

BioNeuralNet Documentation & Examples

Table of Contents

1. Installation

BioNeuralNet supports Python 3.10, 3.11 and 3.12.

1.1. Install BioNeuralNet

pip install bioneuralnet

1.2. Install PyTorch and PyTorch Geometric

BioNeuralNet relies on PyTorch for GNN computations. Install PyTorch separately:

  • PyTorch (CPU):

    pip install torch torchvision torchaudio
  • PyTorch Geometric:

    pip install torch_geometric

For GPU acceleration, please refer to:

BioNeuralNet Core Features

For an end-to-end example of BioNeuralNet, see the Quick Start and TCGA-BRCA Dataset guides.

  • Given a multi-omics network as input, BioNeuralNet can generate embeddings using Graph Neural Networks (GNNs).
  • Generate embeddings using methods such as GCN, GAT, GraphSAGE, and GIN.
  • Outputs can be obtained as native tensors or converted to pandas DataFrames for easy analysis and visualization.
  • Embeddings unlock numerous downstream applications, including disease prediction, enhanced subject representation, clustering, and more.
  • Identify functional modules or communities using correlated clustering methods (e.g., CorrelatedPageRank, CorrelatedLouvain, HybridLouvain) that integrate phenotype correlation to extract biologically relevant modules [1].
  • Clustering methods can be applied to any network representation, allowing flexible analysis across different domains.
  • All clustering components return either raw partition dictionaries or induced subnetwork adjacency matrices (as DataFrames) for visualization.
  • Use cases include feature selection, biomarker discovery, and network-based analysis.

Subject Representation

  • Integrate node embeddings back into omics data to enrich subject-level profiles by weighting features with the learned embedding.
  • This embedding-enriched data can be used for downstream tasks such as disease prediction or biomarker discovery.
  • The result can be returned as a DataFrame or a PyTorch tensor, fitting naturally into downstream analyses.

Disease Prediction for Multi-Omics Network (DPMON) [2]

  • Classification end-to-end pipeline for disease prediction using Graph Neural Network embeddings.
  • DPMON supports hyperparameter tuning, when enabled, it finds the best configuration for the given data.
  • This approach, along with native pandas integration across modules, ensures that BioNeuralNet can be easily incorporated into your analysis workflows.
  • Visualize embeddings, feature variance, clustering comparison, and network structure in 2D.
  • Evaluate embedding quality and clustering relevance using correlation with phenotype.
  • Performance benchmarking tools for classification tasks using various models.
  • Useful for assessing feature importance, validating network structure, and comparing cluster outputs.
  • Build graphs using k-NN similarity, Pearson/Spearman correlation, RBF kernels, mutual information, or soft-thresholding.
  • Filter and preprocess omics or clinical data by variance, correlation, random forest importance, or ANOVA F-test.
  • Tools for network pruning, feature selection, and data cleaning.
  • Quickly summarize datasets with variance, zero-fraction, expression level, or correlation overviews.
  • Includes conversion tools for RData and integrated logging.
  • Graph Construction:
    • BioNeuralNet provides additional tools in the bioneuralnet.external_tools module.
    • Includes support for SmCCNet (Sparse Multiple Canonical Correlation Network), an R-based tool for constructing phenotype-informed correlation networks [3].
    • These tools are optional but enhance BioNeuralNet’s graph construction capabilities and are recommended for more integrative or exploratory workflows.

3. Example: SmCCNet + DPMON for Disease Prediction

import pandas as pd
from bioneuralnet.external_tools import SmCCNet
from bioneuralnet.downstream_task import DPMON
from bioneuralnet.datasets import DatasetLoader

# Step 1: Load your data or use one of the provided datasets
Example = DatasetLoader("example1")
omics_proteins = Example.data["X1"]
omics_metabolites = Example.data["X2"]
phenotype_data = Example.data["Y"]
clinical_data = Example.data["clinical_data"]

# Step 2: Network Construction
smccnet = SmCCNet(
    phenotype_df=phenotype_data,
    omics_dfs=[omics_proteins, omics_metabolites],
    data_types=["protein", "metabolite"],
    kfold=5,
    summarization="PCA",
)
global_network, clusters = smccnet.run()
print("Adjacency matrix generated.")

# Step 3: Disease Prediction (DPMON)
dpmon = DPMON(
    adjacency_matrix=global_network,
    omics_list=[omics_proteins, omics_metabolites],
    phenotype_data=phenotype_data,
    clinical_data=clinical_data,
    model="GCN",
)
predictions = dpmon.run()
print("Disease phenotype predictions:\n", predictions)

4. Documentation and Tutorials

  • Full documentation: BioNeuralNet Documentation

  • Jupyter Notebook Examples:

  • Tutorials include:

    • Multi-omics graph construction.
    • GNN embeddings for disease prediction.
    • Subject representation with integrated embeddings.
    • Clustering using Hybrid Louvain and Correlated PageRank.
  • API details are available in the API Reference.

5. Frequently Asked Questions (FAQ)

  • Does BioNeuralNet support GPU acceleration? Yes, install PyTorch with CUDA support.

  • Can I use my own omics network? Yes, you can provide a custom network as an adjancy matrix instead of using SmCCNet.

  • What clustering methods are supported? BioNeuralNet supports Correlated Louvain, Hybrid Louvain, and Correlated PageRank.

For more FAQs, please visit our FAQ page.

6. Acknowledgments

BioNeuralNet integrates multiple open-source libraries. We acknowledge key dependencies:

  • PyTorch - GNN computations and deep learning models.
  • PyTorch Geometric - Graph-based learning for multi-omics.
  • NetworkX - Graph data structures and algorithms.
  • Scikit-learn - Feature selection and evaluation utilities.
  • pandas & numpy - Core data processing tools.
  • ray[tune] - Hyperparameter tuning for GNN models.
  • matplotlib - Data visualization.
  • cptac - Dataset handling for clinical proteomics.
  • python-louvain - Community detection algorithms.
  • statsmodels - Statistical models and hypothesis testing (e.g., ANOVA, regression).

We also acknowledge R-based tools for external network construction:

  • SmCCNet - Sparse multiple canonical correlation network.

7. Testing and Continuous Integration

  • Run Tests Locally:

    pytest --cov=bioneuralnet --cov-report=html
    open htmlcov/index.html
  • Continuous Integration: GitHub Actions runs automated tests on every commit.

8. Contributing

We welcome contributions! To get started:

git clone https://github.com/UCD-BDLab/BioNeuralNet.git
cd BioNeuralNet
pip install -r requirements-dev.txt
pre-commit install
pytest

How to Contribute

  • Fork the repository, create a new branch, and implement your changes.
  • Add tests and documentation for any new features.
  • Submit a pull request with a clear description of your changes.

9. License

BioNeuralNet is distributed under the MIT License.

10. Contact

11. References

[1] Abdel-Hafiz, M., Najafi, M., et al. "Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification." Frontiers in Big Data, 5 (2022). DOI: 10.3389/fdata.2022.894632

[2] Hussein, S., Ramos, V., et al. "Learning from Multi-Omics Networks to Enhance Disease Prediction: An Optimized Network Embedding and Fusion Approach." In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 2024, pp. 4371-4378. DOI: 10.1109/BIBM62325.2024.10822233

[3] Liu, W., Vu, T., Konigsberg, I. R., Pratte, K. A., Zhuang, Y., & Kechris, K. J. (2023). "Network-Based Integration of Multi-Omics Data for Biomarker Discovery and Phenotype Prediction." Bioinformatics, 39(5), btat204. DOI: 10.1093/bioinformatics/btat204

About

A framework for analyzing biological data via graph construction, clustering, and embedding generation. The resulting embeddings power downstream tasks like disease prediction, subject representation, and biomarker discovery.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •