Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: More features for ReceptiveField #3884

Open
5 of 10 tasks
larsoner opened this issue Jan 5, 2017 · 52 comments
Open
5 of 10 tasks

ENH: More features for ReceptiveField #3884

larsoner opened this issue Jan 5, 2017 · 52 comments

Comments

@larsoner
Copy link
Member

larsoner commented Jan 5, 2017

  • support score method for use with GridSearchCV
  • use custom Ridge solver for speed and multiple regularization types (i.e., Ross's Fourier-based code)
  • change to positive lags for causal behavior MRG: Change to positive lags #4550
  • backward transform
  • Add CUDA support for the fitting
  • if necessary, make tmin/tmax/sfreq logic consistent with Epochs class
  • forward/backward model tutorial?
  • plot method that is smart enough to at least deal with 1D (STA) and 2D (STRF) plotting, probably using NonUniformImage class rather than pcolormesh
  • decimation (decimate data before fitting or model after fitting?)
  • refactor with linear_regression_raw, XDawn (categorical data, Add a bit of support for categorical data to TimeDelayedRidge? #4940)
@larsoner larsoner modified the milestone: 0.15 Jan 6, 2017
@larsoner larsoner removed this from the 0.15 milestone Aug 4, 2017
@nbara
Copy link
Contributor

nbara commented Sep 4, 2017

Hi Eric, would here be the place to suggest another addition to ReceptiveField?

I've been using it a bit lately, especially for backward models (ie stimulus reconstruction), and there's something that (I think) is missing that would be useful. You may have seen this paper by Haufe et al. (2014), where they show how to forward-transorm a backward model (for which the model weights are not easily interpretable).

Ed Lalor's mTRF toolbox does this (see fig4E here), but unless I'm mistaken there's no quick way to do this directly in MNE/sklearn with the current code ?

Ps: pinging @choldgraf as well since we've briefly discussed this on gitter

@agramfort
Copy link
Member

agramfort commented Sep 4, 2017 via email

@nbara
Copy link
Contributor

nbara commented Sep 4, 2017

I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multi_dot([np.dot(X.T, X), coef_, np.inv(np.dot(y.T, y))]) (not sure)? But I wasn't sure that was something you'd find interesting/useful at all.

@kingjr
Copy link
Member

kingjr commented Sep 4, 2017 via email

@agramfort
Copy link
Member

agramfort commented Sep 4, 2017 via email

@kingjr
Copy link
Member

kingjr commented Sep 4, 2017

I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multi_dot([np.dot(X.T, X), coef_, np.inv(np.dot(y.T, y))]) (not sure)? But I wasn't sure that was something you'd find interesting/useful at all.

+1 for having that feature computed upon request.

@choldgraf
Copy link
Contributor

@nbara would you like to give it a shot? Feel free to ping me for a review if you like.

Quick questions:

  1. At the call to .fit, should we store the covariance matrices of X any y. (or maybe just store X and y and compute covar later?)

  2. Should this be a lazy attribute? E.g., have an inverse_coefs_ attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?

@larsoner
Copy link
Member Author

larsoner commented Sep 4, 2017

At the call to .fit, should we store the covariance matrices of X any y. (or maybe just store X and y and compute covar later?)

We should store the X covariance, it is the time-consuming computation IIRC

Should this be a lazy attribute? E.g., have an inverse_coefs_ attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

@choldgraf
Copy link
Contributor

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

Yeah I think so too. Can we think of edge cases where we'd regret this? E.g. gigantic X matrices? Could be a silent memory issue if the object is suddenly storing an N x N matrix if N is really large...

@kingjr
Copy link
Member

kingjr commented Sep 4, 2017 via email

@larsoner
Copy link
Member Author

larsoner commented Sep 4, 2017

E.g. gigantic X matrices?

The covariance (what we'd need to store) is just (n_feat, n_feat) and X is (n_samp, n_feat) with n_samp >> n_feat -- so if you're able to compute the fit at all you probably won't be hurt too much by storing the cov. It's still possible (especially if we start to be smart about mem usage with the computations) to hit memory but I agree with @kingjr we can deal with that when we hit it (and hope/assume YAGNI).

@choldgraf
Copy link
Contributor

that's fine w/ me - just bringing it up as worth thinking about. My concern is more that users won't necessarily think that this will happen intuitively, so it's silently storing an N x N matrix in each model object...what if somebody doesn't know about this and is doing cross-validation with 100s of models, storing the object each time? I think it'd be fine to solve this with a good mention in the docstring for now tho.

@nbara
Copy link
Contributor

nbara commented Sep 4, 2017

We should store the X covariance, it is the time-consuming computation IIRC

Actually no, ...

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

... from what I can tell, if one chooses TimeDelayingRidge, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays) in fit. However, the data needs to be in that shape before it's multiplied with the covariances.

I've just given a jab at this. Here's a gist with a function that does the job (not fully tested yet but seems to give sensible weights).
https://gist.github.com/nbara/c0f2f05ad5db1d1710e591ce89d7b645

It's not lightweight at all since I've got to _delay_time_series and _reshape_for_est before the final dot product...

PS: Feel free to comment on the gist if you don't want to bloat this thread.

@larsoner
Copy link
Member Author

larsoner commented Sep 4, 2017

from what I can tell, if one chooses TimeDelayingRidge, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays) in fit. However, the data needs to be in that shape before it's multiplied with the covariances.

It probably does not need to be reshaped. See how predict etc. are done -- the equivalence between convolution and matrix multplication are used instead.

The best thing to do is probably to open a WIP PR, rather than a gist. Let's get something that works properly with unit tests, then worry about addressing the bottlenecks.

@choldgraf
Copy link
Contributor

any movement on this? @nbara wanna give a shot at a PR?

@nbara
Copy link
Contributor

nbara commented Sep 9, 2017

Yes I'm happy to try, but it's likely that I'll be be needing some help at some point as I'm not sure a) how to make the code work for TimeDelayingRidge and b) what a proper unit test for this would be.

@choldgraf
Copy link
Contributor

choldgraf commented Sep 9, 2017 via email

@nbara
Copy link
Contributor

nbara commented Sep 10, 2017

Ok starting to work on it now. Are we sure we want to have this computed by default in fit? Because I don't think it would be terribly useful for forward models (why would one want to backward-transform forward weights?)

@choldgraf
Copy link
Contributor

sounds good - I think it would need to be computed in fit...that's the only situation in which the object would have access to the X and the y variables, no?

@choldgraf
Copy link
Contributor

I guess another option would be to have a special function for this, e.g.,:

mne.decoding.inverse_transform(coefs, X, y)

@agramfort
Copy link
Member

agramfort commented Sep 10, 2017 via email

@nbara
Copy link
Contributor

nbara commented Sep 10, 2017

I guess another option would be to have a special function for this, e.g.,:

mne.decoding.inverse_transform(coefs, X, y)

Yeah, something like this. Initially I imagined this becoming a class method, such as rf.inverse_coefs(X, y). In my mind this was not something that every user would need and use often, it's mostly a convenience for those who do backward modelling.

what's the cost compared to estimation of coef_ ? if it's small let's
compute it all the time.

Depends how well I manage to code this :)
More seriously, I need to work on this a bit more before I can answer this. In principle this could be pretty lightweight, but I need to take a closer look at TimeDelayingRidge to be sure.

@kingjr
Copy link
Member

kingjr commented Sep 10, 2017

I don't have the exact usescases and constraints in mind, but just to be sure we are on the same line of thoughts, here is what we do for classic decoding analyses:

# don't compute patterns if not necessary
clf = Ridge()
clf.fit(X, y)

to get the patterns

clf = mne.decoding.LinearModel(Ridge())
clf.fit(X, y)
clf.patterns_

to have this in a pipeline

clf = make_pipeline(StandardScaler(), LinearModel(Ridge())
clf.fit(X, y)
mne.decoding.get_coef(clf, 'patterns_', inverse_transform=True)

And with mne wrapper compatibility e.g

clf = mne.decoding.SlidingEstimator(LinearModel(Ridge()))
clf.fit(X, y)
get_coef(clf, 'patterns_')

In other words, we classically don't compute the patterns (cov etc) at fitting; if users want to have the patterns, they use the LinearModel additional step, and the get_coef to have interpretable coefficients when multiple preprocessing steps are used.

It would be great to have an analogous API.

@choldgraf
Copy link
Contributor

@kingjr so the idea is that the only thing that LinearModel does is expose a patterns_ attribute upon the call to fit? If the computation is non-trivial, then I'd be fine with doing that. If it is relatively trivial (in terms of time), then I think it's also fine to just compute it at fit...

@nbara
Copy link
Contributor

nbara commented Sep 10, 2017

@choldgraf Just ran a few quick tests. Using your tutorial data, I spend an extra ~1.3 seconds in fit to compute the inverse coefs on my macbook. (note that I only got it to work with estimator=Ridge() yet and it might take more/less with TimeDelayingRidge)

@larsoner
Copy link
Member Author

larsoner commented Sep 10, 2017 via email

@agramfort
Copy link
Member

agramfort commented Sep 11, 2017 via email

@choldgraf
Copy link
Contributor

@agramfort would patterns_ only be for the inverted coefficients? In that case, is the word "patterns" standard in the eeg/meg decoding world? Just wanna make sure we're being semantically consistent here...

@agramfort
Copy link
Member

agramfort commented Sep 11, 2017 via email

@choldgraf
Copy link
Contributor

so

coefs_ == filters_ == "receptive field"

and

patterns_ == "electrode weights"

?

I'm +1 on calling it patterns_ in this case. I have a slight preference for it just storing the covariance matrices and computing patterns_ lazily when people ask for it. Though I also think @kingjr 's approach is pretty clever (though it complicates the API a bit).

Can we think of any other situation where we'd want to do this set of operations? (storing covariance of X/y in the call to fit, and then using them to invert coefficients). If so, then I think that's a good case to make this another class that can wrap around ReceptiveField in the way JR describes above.

@agramfort
Copy link
Member

agramfort commented Sep 11, 2017 via email

@larsoner
Copy link
Member Author

@choldgraf @nbara I've been thinking about it and I think it makes more sense to use positive lags in the forward/encoding (stimulus->brain) models than negative lags. For example when X is speech and y is brain/EEG (which it is in both of our examples), the system h in the equation y = X * h needs to be active in positive/causal lags. But in both of our examples we do it the other way. Even the last STRF plot in this example is time-reversed relative to the one in the paper.

It's not too late to fix this as we haven't released. WDYT?

@larsoner larsoner added this to the 0.15 milestone Sep 12, 2017
@choldgraf
Copy link
Contributor

choldgraf commented Sep 12, 2017

the way I think about it is in terms of experiment time. Where do those features in X exist relative to each timepoint in y. In an encoding model, they are negative since they happen at t - N timepoints relative to y.

I think the problem is that when we call them "time delays" (or lags), we're not saying "where relative to y does this feature of X exist". We're saying "what is the value of N" (in the equation above).

Personally, I think it's more intuitive to use negative numbers to represent "things that happen in the past", but if the field has standards around positive numbers, I'm not going to bikeshed this one. I can see a case for both.

I don't wanna get into a situation like EEG has where half of their plots have the y-axis flipped, that is just the dumbest thing ever. :-)

@larsoner
Copy link
Member Author

if the field has standards around positive numbers

I think the standard might be to use positive numbers for causal behavior in forward models. Do you agree? I am no expert here. The Lalor paper seems to present their forward model that way, at least.

@larsoner
Copy link
Member Author

It might be that by convention STRF plots are sometimes (probably half the time :) ) shown with negative lags, but in terms of the mathematics of the model being convolution of a signal (speech) with a kernel (STRF) to get an output (brain activation), the lags should be positive I think.

@choldgraf
Copy link
Contributor

I am not really interested in what is technically correct, as much as what is intuitively correct to most people (the vast majority of which don't really know what convolution is mathematically).

Re: standards, here's a quick lit summary:

so in short, I think you're right, we should use positive time lags :-)

@choldgraf
Copy link
Contributor

choldgraf commented Sep 12, 2017

(FWIW, my initial intention was to find papers that used negative lags in their plots, but it was actually kind of difficult...including for my own paper...which tells me that I'm probably wrong in my intuition haha)

@larsoner
Copy link
Member Author

Okay, I'll mark this one for 0.15 so we change the tmin/tmax def assuming nobody else has objections. I should be able to do it this week.

@rkmaddox
Copy link
Contributor

+1 for positive lags. If you think of the TRF (no S) when a stimulus is a bunch of clicks (so, x = click train, y = EEG signal), then the brain response should be at positive lags, because a click causes the brain signal to wobble after the click happens.

I agree that it gets a little hairier with the STRF, but positive lags still make by far the most sense to me.

@larsoner
Copy link
Member Author

(And sorry in advance to any early dev adopters who might have code that will break due to this change.)

@choldgraf
Copy link
Contributor

@larsoner this was a good catch before 0.15...it would have been way more annoying to fix this after users had started incorporating it using the pip version. IMO when it's on master anything goes in terms of breaking changes :-)

@nbara
Copy link
Contributor

nbara commented Sep 12, 2017

+1 for positive lags as well

@larsoner
Copy link
Member Author

@nbara before you put too much time/effort into updating an example in #4546 (which I just bothered you about) let me take care of the lags. Because yours should then be reversed / negative while plotting.

@choldgraf
Copy link
Contributor

looks like we're on the same page...in that case: https://youtu.be/U3HMZsDioB8?t=4m17s

@agramfort
Copy link
Member

agramfort commented Sep 12, 2017 via email

@larsoner larsoner removed this from the 0.15 milestone Sep 13, 2017
@larsoner
Copy link
Member Author

@nbara are you still using ReceptiveField much? Any interest in adding the inverse transform bits? Sounds like we have converged on setting it to patterns_. I say we just compute it when doing fit since it should be a small cost compared to the other steps.

@larsoner
Copy link
Member Author

... oh wait, looks like the patterns_ part is already done!

@nbara
Copy link
Contributor

nbara commented Aug 19, 2018

:) yup ! I even added a small section in the tutorial for this

I haven't used RF much lately but one feature I remember missing is the ability to subsample the lags/delays, or to provide a list of lags instead of tmin and tmax. Of course you can always downsample your data but it would be more convenient to have this feature directly in ReceptiveField.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants