Description
Describe the bug
Bug reported by @ShreyDixit:
Assume that one successfully initializes an object of class DoubleMLData
. Then alters a property like y_col
in a way that violates some basic assumptions (e.g., the same variable cannot be at the same time the outcome variable y_col
and the treatment variable d_cols
). This results in a ValueError being raised. However, nevertheless the object mutates and violates the basic assumption.
--> So while the ValueError is appropriately raised, the object nevertheless mutates and the y_col
property is changed. The root cause is in the setter for the y_col
property
doubleml-for-py/doubleml/double_ml_data.py
Lines 353 to 365 in 0690cc6
Basically the value shouldn't be set before all checks have been successfully applied. However, in its current form the _check_disjoint_sets()
check requires that the properties have been set already. The same issue also applies to the other setters for properties like d_cols
, x_cols
, etc. Note however, that this issue only becomes relevant if an object of class DoubleMLData
has been initialized successfully and if then the user alters one of the properties in a way that violates _check_disjoint_sets()
.
Minimum reproducible code snippet
Code block 1
from doubleml.datasets import make_plr_CCDDHNR2018
dml_data = make_plr_CCDDHNR2018()
print(dml_data.y_col)
dml_data.y_col = 'd'
Code block 2
print(dml_data.y_col)
Expected Result
First code block: dml_data.y_col == 'y'
and raise exception
ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.
Second code block: dml_data.y_col == 'y'
should still hold.
Actual Result
First code block: dml_data.y_col == 'y'
and raise exception
ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.
Second code block: dml_data.y_col == 'd'
Versions
Python 3.9.7
DoubleML 0.4.1
Scikit-Learn 1.0.1