-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linear_pool
method
#20
Comments
We'd like to think about how we might incorporate trimming here. Think about trimming either the most extreme models at each point on the horizontal axis, or the most extreme models at each point on the vertical axis. Would like to have something that's consistent across the different output types, or a way to specify which kind of trimming to do that validates that the combination of output type and trimming type is supported. From Emily: here's the paper where trimming is proposed, and my paper on LOP vs. Vincenti: |
Noting that after discussion with Emily today, we decided to build around distfromq for quantile interpolation, possibly adding functionality to distfromq for simpler, faster methods if the spline-based method is too slow. |
I'm going to record some ongoing questions about how to handle samples in the linear pool ensemble function. First, note that in the simplest case where we have (1) equal weights for all models, (2) the same number of samples from each component model, (3) no limit on the number of samples the ensemble is allowed to produce, and (4) no desire to enforce any particular (or consistent) dependence structure on the ensemble samples, the ensembling operation is straightforward: we simply collect all of the samples from component models and update the sample index (specified by the
My basic question is: how many of these considerations should we address in this function? I think that a first vote is that 4 seems out of scope for this function, but we might want to do something about 1, 2, and 3? |
|
I'm on board with all of Seb's suggestions here (noting for 2 that I think of resampling with replacement as a strategy for dealing with weighting as in 1.i.) |
I agree with everything that has been stated so far. A few additional comments about how I've seen samples used/presented (most relevant to (3)).
|
good points, Emily. Some quick thoughts, essentially agreeing with you:
|
To make it so that we can merge in PR hubverse-org/hubUtils#26 soon with some good working functionality, I'm proposing that we move two pieces of functionality we've discussed for this function out to separate issues: handling of samples, and trimming. I have filed the separate issues hubverse-org/hubUtils#27 and hubverse-org/hubUtils#28 for these things. |
We would like to have a linear opinion pool method. Here's a proposed function signature:
The operation will vary by the
output_type
:output_type
smean
,cdf
,pmf
, we can callhubEnsembles::simple_ensemble
directly using a (weighted) mean. Note: in the documentation, we should describe the motivation for this choice for themean
output_type
: the mean of a distributional mixture is the mean of the component distributions.output_type
sample
, the ensemble should collect the samples from all individual models. Theoutput_type_id
column values, containing sample indices, should be updated to ensure that samples from different individual models are given distinct sample indices. Note that the simple suggestion here only works if we have the same number of samples from each component model and the component models have equal weight. Otherwise, we would have to somehow represent weights for these samples.output_type
quantile
, The basic idea is to get an estimate of the cdf and use that. There are two reference implementations of this idea out there:LOP
method in @eahowerton 'sCombineDistributions
package: https://github.com/eahowerton/CombineDistributions/blob/main/R/LOP.Rdistfromq
package has functionality to estimate a cdf or quantile function from provided quantiles, and to generate samples from that distribution. See the vignette here. There is an example of using this for ensemble calculations here.The text was updated successfully, but these errors were encountered: