You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Thank you very much for tackling this issue of confounders, which seems very recurrent in clinical ML problems.
I have some questions about the project/paper:
I am wondering why only the test set needs to be Deconfounded? Why not build also a train set which is Deconfounded and a Deconfounded test set (with no data leakage of course)?
I tried to make a generalization of your methodology with k multiple confounders
I still used most of your codebase and I used a pseudo generalization of the mutual information of multiple variables.
The probability to be sampled m_i which was
is now:
The quantity can still be estimated with kernel density estimation.
I made some quick toy examples, it seems to approximately work on simple additive toy examples and when the number of example is sufficient:
For instance with 1000 sample and 10 confounding factors i got:
For instance with 100 sample and 3 confounding factors i got:
It would be also interesting to study the required N to be sure at a certain level the deconfounding capability for k factors considering the type of link.
Do you think this is a correct approach and generalization?
Thank you
Best regards
The text was updated successfully, but these errors were encountered:
Hello,
Thank you very much for tackling this issue of confounders, which seems very recurrent in clinical ML problems.
I have some questions about the project/paper:
and
a Deconfounded test set (with no data leakage of course)?k
multiple confoundersI still used most of your codebase and I used a pseudo generalization of the mutual information of multiple variables.
The probability to be sampled m_i which was
is now:
The quantity
can still be estimated with kernel density estimation.
I made some quick toy examples, it seems to approximately work on simple additive toy examples and when the number of example is sufficient:
For instance with 1000 sample and 10 confounding factors i got:
For instance with 100 sample and 3 confounding factors i got:
It would be also interesting to study the required
N
to be sure at a certain level the deconfounding capability fork
factors considering the type of link.Do you think this is a correct approach and generalization?
Thank you
Best regards
The text was updated successfully, but these errors were encountered: