DEGA — Differential Expression & Gene Analysis

DEGA is a reproducible pipeline and interactive Jupyter notebook for performing differential gene expression (DGE) analysis on gene expression datasets. The repository includes a fully documented notebook (DEGA.ipynb) present in colab folder that runs the analysis, plus an output folder containing publication-ready tables, plots, and summaries generated by the notebook.

Latest updates

2025-08-05 — Analysis run and outputs exported. This release includes:
- publication_ready_results.csv and comprehensive_deg_results.csv.
- Figures: volcano_plot.png, ma_plot.png, top_genes_heatmap.png, top_genes_boxplots.png, exploratory_analysis.png, quality_assessment.png, and more.
- statistical_summary.txt summarizing the key metrics from the run.

Project summary

DEGA is intended for researchers who want a clear, reproducible workflow to go from raw or pre-processed expression tables to:

Quality assessment & exploratory data analysis (PCA, clustering, QC plots)
Differential expression testing (fold-change, adjusted p-values)
Visualization (volcano, MA, heatmaps, boxplots)
Export of publication-ready tables

The repository contain notebooks that performs the entire analysis and writes output files which can be seen in a outputs folder.

--

Installation

The notebook requires a standard Python stack. The first cell installs and imports the dependencies used in the analysis. Recommended to create an isolated environment:

# create environment (conda recommended)
conda create -n dega python=3.10 -y
conda activate dega

# install core packages
pip install --upgrade pip
pip install jupyterlab geoquery GEOparse pandas numpy scipy matplotlib seaborn scikit-learn rpy2

# optional extras used by the notebook for exporting/figures
pip install openpyxl xlsxwriter plotly kaleido adjustText

The notebook's first cell includes pip install statements so it can be run in a fresh Colab/Binder session as well.

One-line (Colab)

Open the notebook in Google Colab (or run locally) — the setup cell will install dependencies automatically.

Notebook structure & recommended run order

DEGA.ipynb is divided into the following high-level sections (run in this order):

Install and import libraries — ensures all Python/R dependencies are available.
Load data & sample — load expression matrices and the sample metadata file (or download via GEO if configured).
Preprocessing & filtering — low-expression filtering and optional normalization steps.
Exploratory data analysis — PCA, sample QC, sample clustering, QC plots.
Differential expression testing — statistical tests, p-value adjustment, fold-change calculation.
Post-processing & filtering — select significant genes by p-value and log2 fold-change thresholds.
Visualization — volcano plot, MA-plot, heatmaps, boxplots for top genes.
Export results — write comprehensive_deg_results.csv, publication_ready_results.csv, figures, and a statistical_summary.txt.

Notes:

The notebook defines threshold variables (e.g. p_threshold, log2fc_threshold) near the DGE section — adjust them before running the visualization cells.
The notebook prints progress and places outputs in the local working directory (see deg_analysis_output zip for an example layout).

Outputs included (example files from `deg_analysis_output.zip`)

comprehensive_deg_results.csv — full results table containing expression means, standard deviations, log2 fold-change, raw and adjusted p-values, and flags for significance/regulation.
publication_ready_results.csv — curated table ready for inclusion in papers/supplementary material.
supplementary_all_genes_analysis.csv — additional summary metrics for all genes.
expression_filtered.csv — filtered expression matrix used for downstream analysis.
statistical_summary.txt — short text summary (date of analysis, number of genes analyzed, counts of up/downregulated genes, etc.).
Figures: volcano_plot.png, ma_plot.png, top_genes_heatmap.png, top_genes_boxplots.png, exploratory_analysis.png, quality_assessment.png, expression_clusters.png, treatment_effect_preview.png, comprehensive_treatment_validation.png.

Representative results (from the latest run)

The latest statistical_summary.txt (analysis date: 2025-08-05) reports:

Total genes analyzed: 1000
Significant genes: 1 (Percent significant: 0.10%)
Upregulated genes: 54
Downregulated genes: 77
Mean fold change (significant): 2.04

See deg_analysis_output/statistical_summary.txt for the full summary and top gene lists.

The notebook also demonstrates how to regenerate figures and adjust significance thresholds.

File structure

repo-root/
├── colab                  # Main analysis notebook
├── notebooks              # each cell in a separate file
├── outputs                # expected results(exported CSVs and figures)
├── requirements.txt       # rquirements to use the repo
└── README.md              # This file

Reproducibility & environment

The notebook tries to install exact Python packages at runtime (see the first cell).
For full reproducibility, record the output of pip freeze or export the conda environment before running the analysis.
If results are to be used in publications, set random seeds and record software versions used (the notebook prints the analysis date in statistical_summary.txt).

Contributing

Contributions and issues are welcome. Please open an issue describing the request or submit a pull request with tests and updated notebook outputs where appropriate. Suggested improvements:

Add a command-line wrapper to run the pipeline headlessly.
Add unit tests for core pre-processing functions.
Add support for common normalization methods (DESeq2 via rpy2, limma-voom, edgeR).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEGA — Differential Expression & Gene Analysis

Latest updates

Project summary

Installation

One-line (Colab)

Notebook structure & recommended run order

Outputs included (example files from `deg_analysis_output.zip`)

Representative results (from the latest run)

File structure

Reproducibility & environment

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
colab		colab
notebooks		notebooks
output		output
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DEGA — Differential Expression & Gene Analysis

Latest updates

Project summary

Installation

One-line (Colab)

Notebook structure & recommended run order

Outputs included (example files from deg_analysis_output.zip)

Representative results (from the latest run)

File structure

Reproducibility & environment

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Outputs included (example files from `deg_analysis_output.zip`)

Packages