-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend GLM Families #35
Comments
The only other GLM family I know of that is used "in the wild" is the Poisson family which should be very straightforward to include. (@mpancia do you know of any others?) @hussainsultan didn't like the use of classes filled with staticmethods just for convenient namespacing, he would prefer a |
@moody-marlin Poisson, for sure; I would also include some of the other basic families, even though they are slightly less common (e.g. binomial and/or with alternate link functions e.g. probit) |
After looking at it for a little bit, I think mimicing the They have (what amounts to) an abstract base class I think something would make just as much sense here, except our specification of the ABC In addition, I think that the |
@mpancia to be concrete can you give an example of what algorithms in |
@mrocklin Yeah, sure! In, e.g. Newton's method: def newton(X, y, max_steps=50, tol=1e-8, family=Logistic):
'''Newtons Method for Logistic Regression.'''
gradient, hessian = family.gradient, family.hessian
n, p = X.shape
beta = np.zeros(p) # always init to zeros?
Xbeta = dot(X, beta)
iter_count = 0
converged = False
while not converged:
beta_old = beta
# should this use map_blocks()?
hess = hessian(Xbeta, X)
grad = gradient(Xbeta, X, y)
hess, grad = da.compute(hess, grad)
# should this be dask or numpy?
# currently uses Python 3 specific syntax
step, _, _, _ = np.linalg.lstsq(hess, grad)
beta = (beta_old - step)
iter_count += 1
# should change this criterion
coef_change = np.absolute(beta_old - beta)
converged = (
(not np.any(coef_change > tol)) or (iter_count > max_steps))
if not converged:
Xbeta = dot(X, beta) # numpy -> dask converstion of beta
return beta The only time that |
I actually really like this idea but I do think a refactor which preserves the runtimes will be more difficult than it initially seems; for example, @mpancia I'd encourage you to try and refactor the However, for newton / admm, this seems very do-able. Moreover, if we could pull this off for all the algorithms, this would most likely allow for improved testing. We could run the algorithms on many different carefully crafted convex functions for which we know the optima (which isn't currently possible for regularized problems), and separately test the implementations of the GLM families / regularizers. This wouldn't be full-out integration testing, but it would still increase our understanding of the accuracy of our implementations. |
So far in https://github.com/dask/dask-glm/blob/master/dask_glm/families.py we have families for linear/normal and logistic that look like the following:
The text was updated successfully, but these errors were encountered: