GitHub - imagebind/hackathon_bioactivity_drug_discovery

🌟 Usefulness and Purpose of ML Implementation for Bioactivity Prediction of Target Proteins

🔹 Drug Discovery Acceleration – ML helps in predicting the bioactivity of compounds against a target protein, significantly reducing time and cost in drug discovery.

🔸 Feature Selection – Variance Thresholding eliminates low-variance features, ensuring the dataset retains only informative features.

🟢 Handling Large Data – ML models efficiently process large-scale bioactivity datasets, identifying patterns humans might miss.

🔵 Regression Analysis – Random Forest Regression predicts bioactivity values (pIC50), providing insights into compound effectiveness.

🟠 Statistical Validation – R-squared value helps evaluate model performance, ensuring reliability in bioactivity predictions.

🔻 Automated Workflow – Streamlit-based UI streamlines dataset selection, preprocessing, and model execution interactively.

🟣 Model Generalization – Train-test split ensures robust performance assessment, avoiding overfitting.

🔵 Compound Filtering – Removing non-informative molecular descriptors improves predictive accuracy.

🟡 Data Visualization – Regression plots (Seaborn) visually depict experimental vs. predicted pIC50 values for better interpretation.

🟠 Biological Relevance – Helps in identifying high-affinity drug candidates for specific target proteins in diseases like Alzheimer's.

🔹 Customizable Analysis – Supports different diseases and descriptor files, making the implementation flexible.

🟢 Non-Experimental Predictions – Reduces the need for extensive lab testing by providing preliminary computational insights.

🔻 Pattern Recognition – Captures nonlinear relationships between molecular descriptors and bioactivity using Random Forest.

🔸 Scalability – ML methods can be adapted to analyze multiple target proteins for different diseases.

🟣 Preprocessing Automation – Eliminates missing values and redundant features, ensuring a cleaner dataset.

🟠 Facilitates Hypothesis Testing – Helps researchers validate drug-target interaction hypotheses before experimental validation.

🟡 Cross-Disciplinary Utility – Useful for computational chemistry, pharmacology, and bioinformatics researchers.

🔵 Interactive Decision-Making – Streamlit UI allows real-time selection and visualization of results.

🔹 Reproducibility – The ML pipeline ensures standardized processing, making results consistent and comparable.

🟢 Future Applications – Can be extended to deep learning models for even more accurate bioactivity predictions.

🌟Implementation Steps:

1️⃣ The ChemBL ID, SMILES and IC50 values fetched automatically from chembl_webresource_client and molecular fingerprints from PubChem API, ensuring the data is stored in a structured format (CSV or DataFrame). The dataset should contain: Molecular Identifiers (e.g., ChemBL ID) Fingerprint Descriptors (Bit-vector/Binary representation) IC50 Values (Bioactivity measurement in nM)

💡 Converted IC50 values to pIC50 for better statistical modeling:

2️⃣ Data Preprocessing 🔹 Load the Data 🔹 Handle Missing Data 🔹 Remove Low-Variance Features

3️⃣ Train-Test Splitting 4️⃣ Train the ML Model 5️⃣ Model Evaluation 6️⃣ Predictions & Visualization 7️⃣ Interactive Streamlit App Future Enhancements:- Use Deep Learning (Neural Networks) for Bioactivity Prediction. Implement Explainable AI (SHAP values) to interpret feature importance. Automate descriptor extraction from PubChem API.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Alzheimer's disease		Alzheimer's disease
Asthma		Asthma
Chronic obstructive pulmonary disease		Chronic obstructive pulmonary disease
Fatty Liver Disease		Fatty Liver Disease
Glaucoma		Glaucoma
Hyperlipidemia		Hyperlipidemia
Hypertension		Hypertension
Obesity		Obesity
Osteoarthritis		Osteoarthritis
OvarianCancer		OvarianCancer
Pancreatic Cancer		Pancreatic Cancer
Parkinson’s Disease		Parkinson’s Disease
Polycystic Ovary Syndrome		Polycystic Ovary Syndrome
Rheumatoid Arthritis		Rheumatoid Arthritis
Type 2 Diabetes		Type 2 Diabetes
README.md		README.md
Untitled.ipynb		Untitled.ipynb
extract_fingerprint.ipynb		extract_fingerprint.ipynb
protein_bioactivity_prediction.ipynb		protein_bioactivity_prediction.ipynb
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

imagebind/hackathon_bioactivity_drug_discovery

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages