Generalization to more than 1 confounding factor #12

Rachine · 2020-07-09T14:24:07Z

Hello,
Thank you very much for tackling this issue of confounders, which seems very recurrent in clinical ML problems.

I have some questions about the project/paper:

I am wondering why only the test set needs to be Deconfounded? Why not build also a train set which is Deconfounded and a Deconfounded test set (with no data leakage of course)?
I tried to make a generalization of your methodology with k multiple confounders

I still used most of your codebase and I used a pseudo generalization of the mutual information of multiple variables.
The probability to be sampled m_i which was

is now:

The quantity
can still be estimated with kernel density estimation.

I made some quick toy examples, it seems to approximately work on simple additive toy examples and when the number of example is sufficient:
For instance with 1000 sample and 10 confounding factors i got:

For instance with 100 sample and 3 confounding factors i got:

It would be also interesting to study the required N to be sure at a certain level the deconfounding capability for k factors considering the type of link.

Do you think this is a correct approach and generalization?

Thank you

Best regards

The text was updated successfully, but these errors were encountered:

Rachine · 2020-07-10T08:55:58Z

Oops, after some thinking maybe I should look at the goodness of fit with the multiple variable and not only individual correlations, to test

I added the R^2 when I do a Ordinary Least Squares with stats model 'y ~ z0 + z1 + z2'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalization to more than 1 confounding factor #12

Generalization to more than 1 confounding factor #12

Rachine commented Jul 9, 2020

Rachine commented Jul 10, 2020

Generalization to more than 1 confounding factor #12

Generalization to more than 1 confounding factor #12

Comments

Rachine commented Jul 9, 2020

Rachine commented Jul 10, 2020