BioNeuralNet is a Python framework for integrating and analyzing multi-omics data using Graph Neural Networks (GNNs). It provides tools for network construction, embedding generation, clustering, and disease prediction, all within a modular, scalable, and reproducible pipeline.
BioNeuralNet Documentation & Examples
- 1. Installation
- 2. BioNeuralNet Core Features
- 3. Quick Example: SmCCNet + DPMON for Disease Prediction
- 4. Documentation and Tutorials
- 5. Frequently Asked Questions (FAQ)
- 6. Acknowledgments
- 7. Testing and Continuous Integration
- 8. Contributing
- 9. License
- 10. Contact
- 11. References
BioNeuralNet supports Python 3.10
, 3.11
and 3.12
.
pip install bioneuralnet
BioNeuralNet relies on PyTorch for GNN computations. Install PyTorch separately:
-
PyTorch (CPU):
pip install torch torchvision torchaudio
-
PyTorch Geometric:
pip install torch_geometric
For GPU acceleration, please refer to:
For an end-to-end example of BioNeuralNet, see the Quick Start and TCGA-BRCA Dataset guides.
- Given a multi-omics network as input, BioNeuralNet can generate embeddings using Graph Neural Networks (GNNs).
- Generate embeddings using methods such as GCN, GAT, GraphSAGE, and GIN.
- Outputs can be obtained as native tensors or converted to pandas DataFrames for easy analysis and visualization.
- Embeddings unlock numerous downstream applications, including disease prediction, enhanced subject representation, clustering, and more.
- Identify functional modules or communities using correlated clustering methods (e.g.,
CorrelatedPageRank
,CorrelatedLouvain
,HybridLouvain
) that integrate phenotype correlation to extract biologically relevant modules [1]. - Clustering methods can be applied to any network representation, allowing flexible analysis across different domains.
- All clustering components return either raw partition dictionaries or induced subnetwork adjacency matrices (as DataFrames) for visualization.
- Use cases include feature selection, biomarker discovery, and network-based analysis.
- Integrate node embeddings back into omics data to enrich subject-level profiles by weighting features with the learned embedding.
- This embedding-enriched data can be used for downstream tasks such as disease prediction or biomarker discovery.
- The result can be returned as a DataFrame or a PyTorch tensor, fitting naturally into downstream analyses.
Disease Prediction for Multi-Omics Network (DPMON) [2]
- Classification end-to-end pipeline for disease prediction using Graph Neural Network embeddings.
- DPMON supports hyperparameter tuning, when enabled, it finds the best configuration for the given data.
- This approach, along with native pandas integration across modules, ensures that BioNeuralNet can be easily incorporated into your analysis workflows.
- Visualize embeddings, feature variance, clustering comparison, and network structure in 2D.
- Evaluate embedding quality and clustering relevance using correlation with phenotype.
- Performance benchmarking tools for classification tasks using various models.
- Useful for assessing feature importance, validating network structure, and comparing cluster outputs.
- Build graphs using k-NN similarity, Pearson/Spearman correlation, RBF kernels, mutual information, or soft-thresholding.
- Filter and preprocess omics or clinical data by variance, correlation, random forest importance, or ANOVA F-test.
- Tools for network pruning, feature selection, and data cleaning.
- Quickly summarize datasets with variance, zero-fraction, expression level, or correlation overviews.
- Includes conversion tools for RData and integrated logging.
- Graph Construction:
- BioNeuralNet provides additional tools in the
bioneuralnet.external_tools
module. - Includes support for SmCCNet (Sparse Multiple Canonical Correlation Network), an R-based tool for constructing phenotype-informed correlation networks [3].
- These tools are optional but enhance BioNeuralNet’s graph construction capabilities and are recommended for more integrative or exploratory workflows.
- BioNeuralNet provides additional tools in the
import pandas as pd
from bioneuralnet.external_tools import SmCCNet
from bioneuralnet.downstream_task import DPMON
from bioneuralnet.datasets import DatasetLoader
# Step 1: Load your data or use one of the provided datasets
Example = DatasetLoader("example1")
omics_proteins = Example.data["X1"]
omics_metabolites = Example.data["X2"]
phenotype_data = Example.data["Y"]
clinical_data = Example.data["clinical_data"]
# Step 2: Network Construction
smccnet = SmCCNet(
phenotype_df=phenotype_data,
omics_dfs=[omics_proteins, omics_metabolites],
data_types=["protein", "metabolite"],
kfold=5,
summarization="PCA",
)
global_network, clusters = smccnet.run()
print("Adjacency matrix generated.")
# Step 3: Disease Prediction (DPMON)
dpmon = DPMON(
adjacency_matrix=global_network,
omics_list=[omics_proteins, omics_metabolites],
phenotype_data=phenotype_data,
clinical_data=clinical_data,
model="GCN",
)
predictions = dpmon.run()
print("Disease phenotype predictions:\n", predictions)
-
Full documentation: BioNeuralNet Documentation
-
Jupyter Notebook Examples:
-
Tutorials include:
- Multi-omics graph construction.
- GNN embeddings for disease prediction.
- Subject representation with integrated embeddings.
- Clustering using Hybrid Louvain and Correlated PageRank.
-
API details are available in the API Reference.
-
Does BioNeuralNet support GPU acceleration? Yes, install PyTorch with CUDA support.
-
Can I use my own omics network? Yes, you can provide a custom network as an adjancy matrix instead of using SmCCNet.
-
What clustering methods are supported? BioNeuralNet supports Correlated Louvain, Hybrid Louvain, and Correlated PageRank.
For more FAQs, please visit our FAQ page.
BioNeuralNet integrates multiple open-source libraries. We acknowledge key dependencies:
- PyTorch - GNN computations and deep learning models.
- PyTorch Geometric - Graph-based learning for multi-omics.
- NetworkX - Graph data structures and algorithms.
- Scikit-learn - Feature selection and evaluation utilities.
- pandas & numpy - Core data processing tools.
- ray[tune] - Hyperparameter tuning for GNN models.
- matplotlib - Data visualization.
- cptac - Dataset handling for clinical proteomics.
- python-louvain - Community detection algorithms.
- statsmodels - Statistical models and hypothesis testing (e.g., ANOVA, regression).
We also acknowledge R-based tools for external network construction:
- SmCCNet - Sparse multiple canonical correlation network.
-
Run Tests Locally:
pytest --cov=bioneuralnet --cov-report=html open htmlcov/index.html
-
Continuous Integration: GitHub Actions runs automated tests on every commit.
We welcome contributions! To get started:
git clone https://github.com/UCD-BDLab/BioNeuralNet.git
cd BioNeuralNet
pip install -r requirements-dev.txt
pre-commit install
pytest
- Fork the repository, create a new branch, and implement your changes.
- Add tests and documentation for any new features.
- Submit a pull request with a clear description of your changes.
BioNeuralNet is distributed under the MIT License.
- Issues and Feature Requests: Open an Issue
- Email: [email protected]
[1] Abdel-Hafiz, M., Najafi, M., et al. "Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification." Frontiers in Big Data, 5 (2022). DOI: 10.3389/fdata.2022.894632
[2] Hussein, S., Ramos, V., et al. "Learning from Multi-Omics Networks to Enhance Disease Prediction: An Optimized Network Embedding and Fusion Approach." In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 2024, pp. 4371-4378. DOI: 10.1109/BIBM62325.2024.10822233
[3] Liu, W., Vu, T., Konigsberg, I. R., Pratte, K. A., Zhuang, Y., & Kechris, K. J. (2023). "Network-Based Integration of Multi-Omics Data for Biomarker Discovery and Phenotype Prediction." Bioinformatics, 39(5), btat204. DOI: 10.1093/bioinformatics/btat204