Skip to content

Commit

Permalink
docs: draft for JOSS
Browse files Browse the repository at this point in the history
  • Loading branch information
TomeHirata committed Aug 9, 2024
1 parent 36dd0f7 commit 800de19
Show file tree
Hide file tree
Showing 2 changed files with 120 additions and 0 deletions.
49 changes: 49 additions & 0 deletions paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
@misc{byambadalai2024estimatingdistributionaltreatmenteffects,
title={Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction},
author={Undral Byambadalai and Tatsushi Oka and Shota Yasui},
year={2024},
eprint={2407.16037},
archivePrefix={arXiv},
primaryClass={econ.EM},
url={https://arxiv.org/abs/2407.16037},
}

@book{fisher1935design,
title={The Design of Experiments},
author={Fisher, Ronald A.},
year={1935},
publisher={Oliver and Boyd}
}

@ARTICLE{2020NumPy-Array,
author = {Harris, Charles R. and Millman, K. Jarrod and
van der Walt, Stéfan J and Gommers, Ralf and
Virtanen, Pauli and Cournapeau, David and
Wieser, Eric and Taylor, Julian and Berg, Sebastian and
Smith, Nathaniel J. and Kern, Robert and Picus, Matti and
Hoyer, Stephan and van Kerkwijk, Marten H. and
Brett, Matthew and Haldane, Allan and
Fernández del Río, Jaime and Wiebe, Mark and
Peterson, Pearu and Gérard-Marchant, Pierre and
Sheppard, Kevin and Reddy, Tyler and Weckesser, Warren and
Abbasi, Hameer and Gohlke, Christoph and
Oliphant, Travis E.},
title = {Array programming with {NumPy}},
journal = {Nature},
year = {2020},
volume = {585},
pages = {357–362},
doi = {10.1038/s41586-020-2649-2}
}

@article{scikit-learn,
title={Scikit-learn: Machine Learning in {P}ython},
author={Pedregosa, F. and Varoquaux, G. and Gramfort, A. and Michel, V.
and Thirion, B. and Grisel, O. and Blondel, M. and Prettenhofer, P.
and Weiss, R. and Dubourg, V. and Vanderplas, J. and Passos, A. and
Cournapeau, D. and Brucher, M. and Perrot, M. and Duchesnay, E.},
journal={Journal of Machine Learning Research},
volume={12},
pages={2825--2830},
year={2011}
}
71 changes: 71 additions & 0 deletions paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
title: 'dte_adj: A Python package for Distributional Treatment Effects'
tags:
- Python
- Distributional Treatment Effects
- Variance Reduction
authors:
- name: Tomu Hirata
orcid: 0009-0006-3140-291X
equal-contrib: true
affiliation: "1, 3"
- name: Undral Byambadalai
corresponding: true
affiliation: 1
- name: Tatsushi Oka
corresponding: true
affiliation: "1, 2"
- name: Shota Yasui
corresponding: true
affiliation: 1
affiliations:
- name: Cyber Agent, Inc, Japan
index: 1
- name: Keio University, Japan
index: 2
- name: Indeed Technologies Japan, Japan
index: 3
date: 9 August 2024
bibliography: paper.bib

# Optional fields if submitting to a AAS journal too, see this blog post:
# https://blog.joss.theoj.org/2018/12/a-new-collaboration-with-aas-publishing
aas-doi: 10.3847/xxxxx
aas-journal: International Conference on Machine Learning
---

# Summary

`dte_adj` is a Python package for computing empirical cumulative distribution function (CDF) and distributional treatment effect (DTE) from data obtained by Randomized control tests. This package also contains a novel method to reduce variance of DTE using pre-treatment covariates introduced in `@Undral:2024`.

# Statement of need

Since the groundbreaking work by `@Fisher:1935`, randomized experiments have been essential in understanding the impact of interventions and shaping policy decisions. A widely used metric in this context is the Average Treatment Effect (ATE). However, exploring the distributional treatment effects often offers a more nuanced understanding than focusing solely on the average effects.
Python is widely used in the research community recently with its flexibility and ease-of-use in the user-interface. However, there is no popular Python library for computing Distributional Treatment Effect from data obtained from randomized experiments. While scipy provides a method for computing the empirical cumulative distribution function, it lacks convenient functions for calculating DTE or for estimating the variance of the distribution.
`dte_adj` was developed to fill the gap by offering the functionalities for 1) computing CDF from data, 2) calculating DTE and its confidence band based on CDF and 3) visualizing DTE. This library uses `numpy` as input and output of methods, which is widely used for matrix computation in Python. The main classes of this library also follows the interface of popular library `scikit-learn`, which makes it easy for the users with Machine Learning development experieneces.

# Functionalities

The high level functionalities of `dte_adj` are as follows:
1. Computing CDF and its variance based on number arrays
2. Calculating distributional parameters and their confidence bands
3. Visualiving distributional parameters and the confidence bands

It currently offers two classes to compute CDF and its variance.
- `SimpleDistributionEstimator`: this class offers a standard way to compute empirical CDF
- `AdjustedDistributionEstimator`: this class offers a way to compute CDF with smaller variance adjusted by pre-treatment covariates introduced in `@Undral:2024`

Both classes implement following methods to calculate distributional parameters.
- `predict_dte`: method for computing Distributional Treatment Effect $DTE_{w, w'}(y) := F_{Y(w)}(y) - F_{Y(w')}(y)$, where $y$ is an outcome variable, $w$ is treatment type , and $F_{Y(w)}(y)$ is cumulative likelihood for treatment type $w$ and outcome $y$.
- `predict_pte`: method for computing Probability Treatment Effect (PTE) $PTE_{w, w'}(y, h) := \left( F_{Y(w)}(y+h) - F_{Y(w)}(y) \right) - \left( F_{Y(w')}(y+h) - F_{Y(w')}(y) \right)$, where $h > 0$ is an interval of each evaluation window.
- `predict_qte`: method for computing Quantile Treatment Effect (QTE) $QTE_{w, w'}(\tau) := F_{Y(w)}^{-1}(\tau) - F_{Y(w')}^{-1}(\tau)$, where $\tau$ is quantile.

Lastly, `dte_adj.plot` module can be used for visualiting the distribution parameters. The examples of the visualization are available in the figures below.

![DTE](docs/source/_static/dte_moment.png)
![PTE](docs/source/_static/pte_simple.png)
![QTE](docs/source/_static/qte.png)

# Acknowledgements

# References

0 comments on commit 800de19

Please sign in to comment.