Skip to content

Issue in setter of y_col properties for objects of class DoubleMLData: Objects mutate despite the fact that a ValueError was raised #144

Open
@MalteKurz

Description

@MalteKurz

Describe the bug

Bug reported by @ShreyDixit:

Assume that one successfully initializes an object of class DoubleMLData. Then alters a property like y_col in a way that violates some basic assumptions (e.g., the same variable cannot be at the same time the outcome variable y_col and the treatment variable d_cols). This results in a ValueError being raised. However, nevertheless the object mutates and violates the basic assumption.

--> So while the ValueError is appropriately raised, the object nevertheless mutates and the y_col property is changed. The root cause is in the setter for the y_col property

@y_col.setter
def y_col(self, value):
reset_value = hasattr(self, '_y_col')
if not isinstance(value, str):
raise TypeError('The outcome variable y_col must be of str type. '
f'{str(value)} of type {str(type(value))} was passed.')
if value not in self.all_variables:
raise ValueError('Invalid outcome variable y_col. '
f'{value} is no data column.')
self._y_col = value
if reset_value:
self._check_disjoint_sets()
self._set_y_z()

Basically the value shouldn't be set before all checks have been successfully applied. However, in its current form the _check_disjoint_sets() check requires that the properties have been set already. The same issue also applies to the other setters for properties like d_cols, x_cols, etc. Note however, that this issue only becomes relevant if an object of class DoubleMLData has been initialized successfully and if then the user alters one of the properties in a way that violates _check_disjoint_sets().

Minimum reproducible code snippet

Code block 1

from doubleml.datasets import make_plr_CCDDHNR2018
dml_data = make_plr_CCDDHNR2018()
print(dml_data.y_col)
dml_data.y_col = 'd'

Code block 2

print(dml_data.y_col)

Expected Result

First code block: dml_data.y_col == 'y' and raise exception

ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.

Second code block: dml_data.y_col == 'y' should still hold.

Actual Result

First code block: dml_data.y_col == 'y' and raise exception

ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.

Second code block: dml_data.y_col == 'd'

Versions

Python 3.9.7
DoubleML 0.4.1
Scikit-Learn 1.0.1

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions