Know what you're estimating. Know when to trust it. Know how it was produced.
π Docs β’ π Tutorials β’ π‘ Examples β’ πΌοΈ Gallery β’ π Cite
|
CausalRL is a research-grade Python library for off-policy evaluation (OPE) that makes causal assumptions explicit. It goes beyond point estimatesβcombining estimand-first design, diagnostics-first reporting, and reproducible benchmarks so you can tell not just what a policy is worth, but whether you should trust the estimate. |
π¦ v0.2.0 (research preview, alpha) Β Β·Β Import:
import crl
|
Every estimator is tied to a formal estimand with explicit identification assumptions |
Overlap, ESS, weight tails, and shift checks run automatically with every evaluation |
IS, DR, WDR, MAGIC, MRDR, MIS, FQE, DualDICE, GenDICE, DRL, and more |
Bounded-confounding curves for robustness to hidden confounders |
|
Load D4RL and RL Unplugged datasets with built-in adapters |
HTML reports with tables, figures, and full metadata bundles |
Synthetic bandit/MDP suites with known true values for validation |
Type-checked, tested, with deterministic seeding throughout |
# Install from PyPI
pip install causalrl
# With all extras
pip install "causalrl[all]"
# Clone and install from source
git clone https://github.com/gsaco/causalrl
cd causalrl
pip install -e .from crl.benchmarks.bandit_synth import SyntheticBandit, SyntheticBanditConfig
from crl.ope import evaluate_ope
# Create a synthetic benchmark with known ground truth
benchmark = SyntheticBandit(SyntheticBanditConfig(seed=0))
dataset = benchmark.sample(num_samples=1000, seed=1)
# Run end-to-end evaluation
report = evaluate_ope(dataset=dataset, policy=benchmark.target_policy)
# View results
print(report.summary_table())
# Generate audit-ready HTML report
report.save_html("report.html")Output:
Estimator Value Std ESS OverlapWarning
0 IS 0.8234 0.0821 412.3 False
1 WIS 0.8156 0.0634 412.3 False
2 DR 0.8189 0.0512 412.3 False
3 WDR 0.8167 0.0498 412.3 False
Ground Truth: 0.8200
# Quick bandit OPE demo
python -m examples.quickstart.bandit_ope
# MDP evaluation
python -m examples.quickstart.mdp_ope
# Run full benchmark suite
python -m experiments.run_benchmarks --suite all --out results/| Pillar | Why It Matters | What You Get |
|---|---|---|
| Estimands | Know what quantity you're estimatingβnot just which estimator | Explicit estimands with identification assumptions via AssumptionSet |
| Diagnostics | Know when an estimate is fragile before acting on it | Overlap checks, ESS, weight tails, shift diagnostics, sensitivity curves |
| Evidence | Know how results were produced for auditing and reproducibility | Versioned configs, deterministic seeds, structured report bundles |
Click to expand full estimator list
| Category | Estimators | Notes |
|---|---|---|
| Importance Sampling | IS, WIS, SN-IS | Propensity-based weighting |
| Doubly Robust | DR, WDR | Combines regression with IS |
| Model-Assisted | MAGIC, MRDR | Variance reduction via modeling |
| Marginalized | MIS | State-marginal importance sampling |
| Value Function | FQE | Fitted Q-Evaluation |
| DICE Family | DualDICE, GenDICE | Distribution correction estimation |
| Double RL | DRL | Double reinforcement learning |
| High-Confidence | HCOPE bounds | Concentration-based bounds |
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Dataset β βββΆ β Estimand β βββΆ β Estimators β βββΆ β Report β
β β β + Assump. β β + Diagnosticsβ β (HTML/JSON)β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β
βΌ βΌ
βββββββββββββββ βββββββββββββββ
β Benchmarks β β Sensitivity β
β (Synth/D4RL)β β Analysis β
βββββββββββββββ βββββββββββββββ
Recommended learning path:
- π Installation Guide
- π Quickstart Tutorial
- π Diagnostics Guide
- π Sensitivity Analysis
- π§ͺ Benchmarking Workflow
Reference:
We welcome contributions! Check out:
If you use CausalRL in academic work, please cite:
@software{causalrl,
author = {Saco, Gabriel},
title = {CausalRL: Estimand-first Causal Reinforcement Learning},
year = {2024},
url = {https://github.com/gsaco/causalrl}
}Or use the "Cite this repository" button on GitHub.
MIT Β© Gabriel Saco
Built with β€οΈ for the causal inference and reinforcement learning communities




