Skip to content

gsaco/causalrl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

CausalRL - Estimand-first causal RL and off-policy evaluation

CausalRL

Know what you're estimating. Know when to trust it. Know how it was produced.

CI Docs Coverage PyPI License Python

πŸ“š Docs β€’ πŸŽ“ Tutorials β€’ πŸ’‘ Examples β€’ πŸ–ΌοΈ Gallery β€’ πŸ“„ Cite


CausalRL is a research-grade Python library for off-policy evaluation (OPE) that makes causal assumptions explicit. It goes beyond point estimatesβ€”combining estimand-first design, diagnostics-first reporting, and reproducible benchmarks so you can tell not just what a policy is worth, but whether you should trust the estimate.

πŸ“¦ v0.2.0 (research preview, alpha) Β Β·Β  Import: import crl


✨ Why CausalRL?

🎯

Estimand-First

Every estimator is tied to a formal estimand with explicit identification assumptions

πŸ”

Diagnostics by Default

Overlap, ESS, weight tails, and shift checks run automatically with every evaluation

πŸ“Š

20+ Estimators

IS, DR, WDR, MAGIC, MRDR, MIS, FQE, DualDICE, GenDICE, DRL, and more

πŸ“ˆ

Sensitivity Analysis

Bounded-confounding curves for robustness to hidden confounders

πŸ“¦

D4RL Compatible

Load D4RL and RL Unplugged datasets with built-in adapters

πŸ“

Audit-Ready Reports

HTML reports with tables, figures, and full metadata bundles

πŸ§ͺ

Ground-Truth Benchmarks

Synthetic bandit/MDP suites with known true values for validation

⚑

Production Ready

Type-checked, tested, with deterministic seeding throughout


πŸš€ Quickstart

Installation

# Install from PyPI
pip install causalrl

# With all extras
pip install "causalrl[all]"

# Clone and install from source
git clone https://github.com/gsaco/causalrl
cd causalrl
pip install -e .

Your First OPE Evaluation

from crl.benchmarks.bandit_synth import SyntheticBandit, SyntheticBanditConfig
from crl.ope import evaluate_ope

# Create a synthetic benchmark with known ground truth
benchmark = SyntheticBandit(SyntheticBanditConfig(seed=0))
dataset = benchmark.sample(num_samples=1000, seed=1)

# Run end-to-end evaluation
report = evaluate_ope(dataset=dataset, policy=benchmark.target_policy)

# View results
print(report.summary_table())

# Generate audit-ready HTML report
report.save_html("report.html")

Output:

              Estimator    Value     Std      ESS  OverlapWarning
0                    IS   0.8234  0.0821   412.3           False
1                   WIS   0.8156  0.0634   412.3           False
2                    DR   0.8189  0.0512   412.3           False
3                   WDR   0.8167  0.0498   412.3           False
Ground Truth: 0.8200

CLI

# Quick bandit OPE demo
python -m examples.quickstart.bandit_ope

# MDP evaluation
python -m examples.quickstart.mdp_ope

# Run full benchmark suite
python -m experiments.run_benchmarks --suite all --out results/

πŸ“Š Sample Outputs

Estimator comparison with confidence intervals

Estimator Comparison
Point estimates with uncertainty quantification

Overlap diagnostics

Overlap Diagnostics
Importance weight ratio distribution

Sensitivity analysis bounds

Sensitivity Analysis
Bounds under hidden confounding

ESS by time step

Temporal ESS
Effective sample size across horizon


🧠 The Three Pillars

Pillar Why It Matters What You Get
Estimands Know what quantity you're estimatingβ€”not just which estimator Explicit estimands with identification assumptions via AssumptionSet
Diagnostics Know when an estimate is fragile before acting on it Overlap checks, ESS, weight tails, shift diagnostics, sensitivity curves
Evidence Know how results were produced for auditing and reproducibility Versioned configs, deterministic seeds, structured report bundles

πŸ“¦ Estimator Suite

Click to expand full estimator list
Category Estimators Notes
Importance Sampling IS, WIS, SN-IS Propensity-based weighting
Doubly Robust DR, WDR Combines regression with IS
Model-Assisted MAGIC, MRDR Variance reduction via modeling
Marginalized MIS State-marginal importance sampling
Value Function FQE Fitted Q-Evaluation
DICE Family DualDICE, GenDICE Distribution correction estimation
Double RL DRL Double reinforcement learning
High-Confidence HCOPE bounds Concentration-based bounds

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Dataset   β”‚ ──▢ β”‚  Estimand   β”‚ ──▢ β”‚ Estimators  β”‚ ──▢ β”‚   Report    β”‚
β”‚             β”‚     β”‚ + Assump.   β”‚     β”‚ + Diagnosticsβ”‚    β”‚  (HTML/JSON)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      β”‚                                        β”‚
      β–Ό                                        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Benchmarks β”‚                         β”‚ Sensitivity β”‚
β”‚ (Synth/D4RL)β”‚                         β”‚  Analysis   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“š Learn the Library

Recommended learning path:

  1. πŸ“– Installation Guide
  2. πŸš€ Quickstart Tutorial
  3. πŸ” Diagnostics Guide
  4. πŸ“ˆ Sensitivity Analysis
  5. πŸ§ͺ Benchmarking Workflow

Reference:


🀝 Contributing

We welcome contributions! Check out:


πŸ“„ Citation

If you use CausalRL in academic work, please cite:

@software{causalrl,
  author = {Saco, Gabriel},
  title = {CausalRL: Estimand-first Causal Reinforcement Learning},
  year = {2024},
  url = {https://github.com/gsaco/causalrl}
}

Or use the "Cite this repository" button on GitHub.


πŸ“œ License

MIT Β© Gabriel Saco


Built with ❀️ for the causal inference and reinforcement learning communities

About

Causal Reinforcement Learning package

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors