Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to return a distribution over response for a point input #305

Open
oxinabox opened this issue Apr 8, 2019 · 2 comments
Open

Function to return a distribution over response for a point input #305

oxinabox opened this issue Apr 8, 2019 · 2 comments

Comments

@oxinabox
Copy link

oxinabox commented Apr 8, 2019

Was talking to @kleinschmidt
who explained to me that for certain types of GLMs
it was possible to calculate the distribution of the response
for a given point input.

That would be really cool, even if it only works for some Link Functions.

Lyndon White [2:20 PM]

I expected GLMs to give me back distributions, but I guess they can’t do that?
Or am I just not calling the right functions

Dave F Kleinschmidt [2:20 PM]

hmmmmmmm no I don't think that's implemented
it shouldn't be hard
at least for the standard ones

Lyndon White [2:20 PM]

In general that is something they do?
It would be mad useful to me.

Dave F Kleinschmidt [2:20 PM]

uhhhhhh maybe
well here's the issue: which distribution?
let's take a linear model
the residual errors are presumed to be normal
so you fit a variance parameter
great, then your predictions for an x value should be that x vector dotted with the coefficients, plus some normally distributed noise with the estimated residual variance, right?

Lyndon White [2:22 PM]

Right

Dave F Kleinschmidt [2:22 PM]

well, what about your uncertainty in the coefficients? do you take that into account?

if you're in linear model land that's easy, it's just another normal distirbution you convolve with the residual error
but if you're in, let's say, logistic land, then now you're talking about normally distributed uncertainty in log-odds space, that then gets converted through to probability with the logit function and then interpreted as a coin flip
for the error model
mayyybe you can do that analytically but I don't want to try
so thats why it's complicated
you can easily get a distribution for the point estimate, but it might not be the right or most informative one

@nalimilan
Copy link
Member

This sounds very speculative to me. Maybe ask on a general stats forum first, or look at possible implementation in other software?

@palday
Copy link
Member

palday commented May 24, 2020

I think the relevant terms in general statistics are "prediction intervals" instead of "confidence intervals" (which GLM already provides). Returning a proper distribution isn't exactly a thing in the frequentist world, but the prediction interval does provide roughly what you're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants