Function to return a distribution over response for a point input #305

oxinabox · 2019-04-08T13:36:10Z

Was talking to @kleinschmidt
who explained to me that for certain types of GLMs
it was possible to calculate the distribution of the response
for a given point input.

That would be really cool, even if it only works for some Link Functions.

Lyndon White [2:20 PM]

I expected GLMs to give me back distributions, but I guess they can’t do that?
Or am I just not calling the right functions

Dave F Kleinschmidt [2:20 PM]

hmmmmmmm no I don't think that's implemented
it shouldn't be hard
at least for the standard ones

Lyndon White [2:20 PM]

In general that is something they do?
It would be mad useful to me.

Dave F Kleinschmidt [2:20 PM]

uhhhhhh maybe
well here's the issue: which distribution?
let's take a linear model
the residual errors are presumed to be normal
so you fit a variance parameter
great, then your predictions for an x value should be that x vector dotted with the coefficients, plus some normally distributed noise with the estimated residual variance, right?

Lyndon White [2:22 PM]

Right

Dave F Kleinschmidt [2:22 PM]

well, what about your uncertainty in the coefficients? do you take that into account?

if you're in linear model land that's easy, it's just another normal distirbution you convolve with the residual error
but if you're in, let's say, logistic land, then now you're talking about normally distributed uncertainty in log-odds space, that then gets converted through to probability with the logit function and then interpreted as a coin flip
for the error model
mayyybe you can do that analytically but I don't want to try
so thats why it's complicated
you can easily get a distribution for the point estimate, but it might not be the right or most informative one

nalimilan · 2019-04-08T20:07:18Z

This sounds very speculative to me. Maybe ask on a general stats forum first, or look at possible implementation in other software?

palday · 2020-05-24T13:22:54Z

I think the relevant terms in general statistics are "prediction intervals" instead of "confidence intervals" (which GLM already provides). Returning a proper distribution isn't exactly a thing in the frequentist world, but the prediction interval does provide roughly what you're looking for.

andreasnoack added this to the Out of scope for next release milestone Nov 20, 2024

andreasnoack added the enhancement label Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function to return a distribution over response for a point input #305

Function to return a distribution over response for a point input #305

oxinabox commented Apr 8, 2019

Lyndon White [2:20 PM]

Dave F Kleinschmidt [2:20 PM]

Lyndon White [2:20 PM]

Dave F Kleinschmidt [2:20 PM]

Lyndon White [2:22 PM]

Dave F Kleinschmidt [2:22 PM]

nalimilan commented Apr 8, 2019

palday commented May 24, 2020

Function to return a distribution over response for a point input #305

Function to return a distribution over response for a point input #305

Comments

oxinabox commented Apr 8, 2019

Lyndon White [2:20 PM]

Dave F Kleinschmidt [2:20 PM]

Lyndon White [2:20 PM]

Dave F Kleinschmidt [2:20 PM]

Lyndon White [2:22 PM]

Dave F Kleinschmidt [2:22 PM]

nalimilan commented Apr 8, 2019

palday commented May 24, 2020