Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalization to more than 1 confounding factor #12

Open
Rachine opened this issue Jul 9, 2020 · 1 comment
Open

Generalization to more than 1 confounding factor #12

Rachine opened this issue Jul 9, 2020 · 1 comment

Comments

@Rachine
Copy link

Rachine commented Jul 9, 2020

Hello,
Thank you very much for tackling this issue of confounders, which seems very recurrent in clinical ML problems.

I have some questions about the project/paper:

  1. I am wondering why only the test set needs to be Deconfounded? Why not build also a train set which is Deconfounded and a Deconfounded test set (with no data leakage of course)?
  2. I tried to make a generalization of your methodology with k multiple confounders
    image
    I still used most of your codebase and I used a pseudo generalization of the mutual information of multiple variables.
    The probability to be sampled m_i which was
    image

is now:

image

The quantity
image can still be estimated with kernel density estimation.

I made some quick toy examples, it seems to approximately work on simple additive toy examples and when the number of example is sufficient:
For instance with 1000 sample and 10 confounding factors i got:
image
For instance with 100 sample and 3 confounding factors i got:

image

It would be also interesting to study the required N to be sure at a certain level the deconfounding capability for k factors considering the type of link.

Do you think this is a correct approach and generalization?

Thank you

Best regards

@Rachine
Copy link
Author

Rachine commented Jul 10, 2020

Oops, after some thinking maybe I should look at the goodness of fit with the multiple variable and not only individual correlations, to test
image

image

image
I added the R^2 when I do a Ordinary Least Squares with stats model 'y ~ z0 + z1 + z2'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant