-
Notifications
You must be signed in to change notification settings - Fork 89
Add variable selection priors #568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
| :param vs_prior_type : str or None, default=None | ||
| Type of variable selection prior: 'spike_and_slab', 'horseshoe', or None. | ||
| If None, uses standard normal priors. | ||
| :param vs_hyperparams : dict, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sphinx format and not numpy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add that into AGENTS.md if it's not already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sphinx format and not numpy
You got me. The doc strings were AI generated. Will fix.
| Provides continuous shrinkage with heavy tails, allowing strong signals | ||
| to escape shrinkage while weak signals are dampened: | ||
| β_j = τ · λ̃_j · β_j^raw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to add maths in here for nice rendering in the API docs
Signed-off-by: Nathaniel <[email protected]>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #568 +/- ##
==========================================
+ Coverage 93.27% 93.31% +0.04%
==========================================
Files 37 39 +2
Lines 5632 5911 +279
Branches 367 386 +19
==========================================
+ Hits 5253 5516 +263
- Misses 248 256 +8
- Partials 131 139 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
|
Marking this one as ready for review. There is still some work to be done on the notebook illustrating the functionality. But i think there is enough here that's it worth flagging the architecture choices for discussion. I've made the variable selection priors available as a module. Currently, just integrated with the IV class, but in principle can be dropped into all regression based modules with coefficients. The pattern simply requires an if-else block to be used in e.g. the propensity score model, linear regression model etc.... What do you guys think? |
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
|
Great @NathanielF ! I think the notebooks need a bit more storyline and explanation 🙏 |
|
Cool, thanks @juanitorduz . I can take another pass at it this weekend. |
PR SummaryIntroduces reusable variable selection priors and integrates them into instrumental variables (IV) regression, with an option for binary treatments.
Written by Cursor Bugbot for commit 18da6c4. This will update automatically on new commits. Configure here. |
Signed-off-by: Nathaniel <[email protected]>
| the assumption of a simple IV experiment. | ||
| The coefficients should be interpreted appropriately.""" | ||
| We will use the multivariate normal likelihood | ||
| for continuous treatment.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Validation warning ignores binary_treatment flag setting
The input_validation method checks if the treatment variable has more than 2 unique values and warns that "We will use the multivariate normal likelihood for continuous treatment." However, this warning doesn't account for the new binary_treatment parameter. If a user sets binary_treatment=True while having continuous treatment data, the warning incorrectly suggests MVN will be used, when actually the Bernoulli likelihood will be applied (which would fail on non-binary data). The validation needs to cross-check the actual data against the self.binary_treatment flag.
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Took another pass @juanitorduz , should be more friendly now |
|
I'll attempt to review this week before I down tools for winter break. FYI, I was not planning on another CausalPy release in 2025. I think there is a lot in the pipeline so that we can start 2026 with some nice major feature releases |
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #1. def inv_logit(z):
We could use https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.expit.html right?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used expit
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #82. data
maybse use data.info() or data.describe() ? There are too many columns (or keep both .head() and .info()?)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
swapped to data.info()
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some math formulas can help explain this: after reading this paragraph, I do not understand what temperature actually does (maybe use a tip box for these math formulas?)
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added much more extensive write up to the maths
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed typo
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added titles
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split the plot. The only thing i liked about the overlap plot was it was a bit clearer how tighter the inference was with vs priors, but you're right they're a bit ugly.
| @@ -0,0 +1,2603 @@ | |||
| { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
juanitorduz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @NathanielF ! Just some minor comments :)
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
|
On the cholesky decomposition @juanitorduz , i re-parameterised again in the binary case. We refactored the joint likelihood into a Conditional Likelihood. Key Changes:
Introduction of the Correction Term: We added expected_U = rho * (sigma_U / sigma_V) * V to the outcome mean. This term "soaks up" the variation in the outcome that is correlated with the treatment assignment Logistic Error Parameterization: To improve sampling stability and match the invlogit link function, we parameterized cursor etc was pushing me towards an inverse probit formulation, but i just found it much more brittle than the logit. |
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Just a draft for the minute. Working through some ideas.
📚 Documentation preview 📚: https://causalpy--568.org.readthedocs.build/en/568/