Skip to content

Add Class for Repeated Cross-Sectional Data #330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 83 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
ac858cd
add a cross-sectional dgp
SvenKlaassen Jun 2, 2025
10e532e
add simple test cases for cross sectional dgp
SvenKlaassen Jun 2, 2025
c96605d
reset index for in panel data
SvenKlaassen Jun 3, 2025
61dbf11
add basic did_cs_binary version with simple tests
SvenKlaassen Jun 3, 2025
ceebc6e
add internal atribute _score_dim to DoubleML class
SvenKlaassen Jun 3, 2025
ade3b9a
check prediction size based on internal n_obs
SvenKlaassen Jun 3, 2025
f113e61
update score dimensions init in the cs object
SvenKlaassen Jun 3, 2025
d65edf8
Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
56d832c
update tests acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
02adb24
update docstrings acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
39d4e7e
update docstrings acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
83cfe9c
update irm submod tests acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
3ff0edb
update irm submod tests acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
caa530e
update irm submod tests acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
4cb9148
update docstrings acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
312f601
update docstrings acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
0d07790
update docstrings acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
8b4f4bc
update documentations acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
5c44395
update tests acc. to Refactor Data Generators #306
JanTeichertKluge Jun 4, 2025
6fa737c
Merge pull request #331 from DoubleML/306-refactor-data-generators
JanTeichertKluge Jun 4, 2025
cada753
Merge pull request #332 from DoubleML/JanTeichertKluge/issue272
JanTeichertKluge Jun 4, 2025
a9f4284
upd
JanTeichertKluge Jun 4, 2025
a2566cb
upd
JanTeichertKluge Jun 4, 2025
9ef4e53
update lambda and p calculation in did_cs
SvenKlaassen Jun 5, 2025
e90441b
add _score_dim property to doubleml class
SvenKlaassen Jun 5, 2025
eb19efe
upd 305
JanTeichertKluge Jun 5, 2025
97abdd8
update data backends
JanTeichertKluge Jun 5, 2025
9f6f5d4
add _n_obs_sample_splitting property to doubleml class
SvenKlaassen Jun 5, 2025
b96a839
some progress on refactoring the data backends.
JanTeichertKluge Jun 5, 2025
eb951c4
update check_resampling input
SvenKlaassen Jun 5, 2025
a6c6507
update did binary classes with n_obs_subset and n_obs_sample_splitting
SvenKlaassen Jun 5, 2025
d54b272
update tune without folds to n_obs of doubleml obj
SvenKlaassen Jun 6, 2025
693e109
change n_obs for panel data
SvenKlaassen Jun 6, 2025
16624d5
fix docstr
JanTeichertKluge Jun 6, 2025
7d6ef35
fix order test
SvenKlaassen Jun 6, 2025
18c3844
add sensitivity estimation to did_cs_binary
SvenKlaassen Jun 6, 2025
5d2232b
fix id positions and scaling for sensitivity
SvenKlaassen Jun 6, 2025
7f01b6b
add placebo test for did_cs_binary
SvenKlaassen Jun 6, 2025
3fafccc
extend ext prediction tests for did_cs_binary
SvenKlaassen Jun 6, 2025
9e37851
add control group test for did_cs_binary
SvenKlaassen Jun 6, 2025
810eade
add tune to did_cs_binary
SvenKlaassen Jun 6, 2025
6b6116c
update did_cs_binary sdout test
SvenKlaassen Jun 6, 2025
de324cf
add exceptions and tests
SvenKlaassen Jun 6, 2025
8d0c52c
simplify did_cs_binary nuisance estimation
SvenKlaassen Jun 6, 2025
af45f7f
add __str__ method to did_cs_binary
SvenKlaassen Jun 11, 2025
698f161
add test on panel data to did_cs binary
SvenKlaassen Jun 11, 2025
0a46b59
add panel type to did multi
SvenKlaassen Jun 11, 2025
45dfcf5
update single gt tests for did_cs
SvenKlaassen Jun 11, 2025
29b0ee7
update exception tests for did cs
SvenKlaassen Jun 11, 2025
895a762
update external prediction tests for did cs
SvenKlaassen Jun 11, 2025
b6ace7d
update placebo tests for did cs multi
SvenKlaassen Jun 11, 2025
176a99d
update plot and return type tests for did multi
SvenKlaassen Jun 11, 2025
9d59e5b
add additional did multi aggregation test
SvenKlaassen Jun 11, 2025
f27bf20
some progress on refactoring the data backends.
JanTeichertKluge Jun 11, 2025
9e3e6d6
update did cs multi test for cs data
SvenKlaassen Jun 12, 2025
5c4d1e2
update did binary to work with unbalanced panels
SvenKlaassen Jun 12, 2025
8437d79
formatting issue
JanTeichertKluge Jun 12, 2025
e58f550
updt. unit tests
JanTeichertKluge Jun 12, 2025
a2deba9
fix cluster DGP to use corret data backend
JanTeichertKluge Jun 12, 2025
cb11684
update unit tests
JanTeichertKluge Jun 12, 2025
3fe83ff
align subset naming in did binary and cs version
SvenKlaassen Jun 12, 2025
1eec50c
fix panel data backend / unit tests
JanTeichertKluge Jun 12, 2025
d71dff6
fix did data backend / unit tests
JanTeichertKluge Jun 12, 2025
74ef476
add depr. warning with version
JanTeichertKluge Jun 12, 2025
e7a9f5c
update return type tests for did cs binary
SvenKlaassen Jun 12, 2025
bba5160
adjust unit tests for ssm
JanTeichertKluge Jun 12, 2025
96ebd03
adjust unit tests for did
JanTeichertKluge Jun 12, 2025
a1686d5
adjust unit tests general
JanTeichertKluge Jun 12, 2025
756092c
adjust unit tests general
JanTeichertKluge Jun 12, 2025
6bac76e
enhance did_multi plotting with anticipation periods and update color…
SvenKlaassen Jun 13, 2025
77b1a6b
update data summary to include unique IDs count in DoubleMLPanelData
SvenKlaassen Jun 13, 2025
e52122f
add flexible summary with multiple formats
SvenKlaassen Jun 16, 2025
bf7e16a
fix format
SvenKlaassen Jun 16, 2025
62a6838
Merge pull request #336 from DoubleML/s-update-summary
SvenKlaassen Jun 16, 2025
6beebd8
fix unit tests
JanTeichertKluge Jun 17, 2025
fb421f7
adjust workflow in parent class `DoubleML`
JanTeichertKluge Jun 17, 2025
b11c0cb
update refactoring acc. to unit test results
JanTeichertKluge Jun 17, 2025
b9bdf7c
add check for correct data backend
JanTeichertKluge Jun 17, 2025
4f70523
renaming after refactoring
JanTeichertKluge Jun 17, 2025
19eab81
adjust dummy data (is_cluster_data flag)
JanTeichertKluge Jun 17, 2025
c3fbbb8
adjust unit tests
JanTeichertKluge Jun 17, 2025
144ee60
adjust t_col setter for DIDData Backend
JanTeichertKluge Jun 17, 2025
2e0fa4a
Merge branch '305-feature-request-integrate-clusters-into-the-doublem…
JanTeichertKluge Jun 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,10 @@ body:
attributes:
label: Minimum reproducible code snippet
description: |
Please provide a short reproducible code snippet. Example:

```python
Please provide a short reproducible code snippet. Example: ```python
import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018
from doubleml.plm.datasets import make_plr_CCDDHNR2018
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(3141)
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ To submit a **bug report**, you can use our
```python
import numpy as np
import doubleml as dml
from doubleml.datasets import make_plr_CCDDHNR2018
from doubleml.plm.datasets import make_plr_CCDDHNR2018
from sklearn.ensemble import RandomForestRegressor
from sklearn.base import clone
np.random.seed(3141)
Expand Down
6 changes: 5 additions & 1 deletion doubleml/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import importlib.metadata

from .data import DoubleMLClusterData, DoubleMLData
from .data import DoubleMLClusterData, DoubleMLData, DoubleMLDIDData, DoubleMLPanelData, DoubleMLRDDData, DoubleMLSSMData
from .did.did import DoubleMLDID
from .did.did_cs import DoubleMLDIDCS
from .double_ml_framework import DoubleMLFramework, concat
Expand Down Expand Up @@ -29,6 +29,10 @@
"DoubleMLIIVM",
"DoubleMLData",
"DoubleMLClusterData",
"DoubleMLDIDData",
"DoubleMLPanelData",
"DoubleMLRDDData",
"DoubleMLSSMData",
"DoubleMLDID",
"DoubleMLDIDCS",
"DoubleMLPQ",
Expand Down
80 changes: 74 additions & 6 deletions doubleml/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,80 @@
The :mod:`doubleml.data` module implements data classes for double machine learning.
"""

import warnings

from .base_data import DoubleMLData
from .cluster_data import DoubleMLClusterData
from .did_data import DoubleMLDIDData
from .panel_data import DoubleMLPanelData
from .rdd_data import DoubleMLRDDData
from .ssm_data import DoubleMLSSMData


class DoubleMLClusterData(DoubleMLData):
"""
Backwards compatibility wrapper for DoubleMLData with is_cluster_data=True.
This class is deprecated and will be removed in a future version.
Use DoubleMLData with is_cluster_data=True instead.
"""

def __init__(
self,
data,
y_col,
d_cols,
cluster_cols,
x_cols=None,
z_cols=None,
t_col=None,
s_col=None,
use_other_treat_as_covariate=True,
force_all_x_finite=True,
):
warnings.warn(
"DoubleMLClusterData is deprecated and will be removed with version 0.12.0. "
"Use DoubleMLData with is_cluster_data=True instead.",
FutureWarning,
stacklevel=2,
)
super().__init__(
data=data,
y_col=y_col,
d_cols=d_cols,
x_cols=x_cols,
z_cols=z_cols,
cluster_cols=cluster_cols,
use_other_treat_as_covariate=use_other_treat_as_covariate,
force_all_x_finite=force_all_x_finite,
force_all_d_finite=True,
is_cluster_data=True,
)

@classmethod
def from_arrays(
cls, x, y, d, cluster_vars, z=None, t=None, s=None, use_other_treat_as_covariate=True, force_all_x_finite=True
):
"""
Initialize :class:`DoubleMLClusterData` from :class:`numpy.ndarray`'s.
This method is deprecated and will be removed with version 0.12.0,
use DoubleMLData.from_arrays with is_cluster_data=True instead.
"""
warnings.warn(
"DoubleMLClusterData is deprecated and will be removed with version 0.12.0. "
"Use DoubleMLData.from_arrays with is_cluster_data=True instead.",
FutureWarning,
stacklevel=2,
)
return DoubleMLData.from_arrays(
x=x,
y=y,
d=d,
z=z,
cluster_vars=cluster_vars,
use_other_treat_as_covariate=use_other_treat_as_covariate,
force_all_x_finite=force_all_x_finite,
force_all_d_finite=True,
is_cluster_data=True,
)


__all__ = [
"DoubleMLData",
"DoubleMLClusterData",
"DoubleMLPanelData",
]
__all__ = ["DoubleMLData", "DoubleMLClusterData", "DoubleMLDIDData", "DoubleMLPanelData", "DoubleMLRDDData", "DoubleMLSSMData"]
Loading
Loading