ENH: More features for ReceptiveField #3884

larsoner · 2017-01-05T21:12:14Z

nbara · 2017-09-04T13:02:51Z

Hi Eric, would here be the place to suggest another addition to ReceptiveField?

I've been using it a bit lately, especially for backward models (ie stimulus reconstruction), and there's something that (I think) is missing that would be useful. You may have seen this paper by Haufe et al. (2014), where they show how to forward-transorm a backward model (for which the model weights are not easily interpretable).

Ed Lalor's mTRF toolbox does this (see fig4E here), but unless I'm mistaken there's no quick way to do this directly in MNE/sklearn with the current code ?

Ps: pinging @choldgraf as well since we've briefly discussed this on gitter

agramfort · 2017-09-04T13:09:42Z

it's just a matter of multiplying the coef_ by covariance of the design. patterns = np.dot(X.T.dot(X), coef_) maybe you can give it a try? @larsoner and me will offer feedback for sure.

nbara · 2017-09-04T13:17:20Z

I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multi_dot([np.dot(X.T, X), coef_, np.inv(np.dot(y.T, y))]) (not sure)? But I wasn't sure that was something you'd find interesting/useful at all.

kingjr · 2017-09-04T13:18:32Z

you may also need to inverse Y:i.e. Sx W Sy^-1: np.cov(X.T).dot(coef_.T.dot(np.linalg.pinv(np.cov((Y-Y.mean(0)).T)))).T

…

On 4 September 2017 at 09:09, Alexandre Gramfort ***@***.***> wrote: it's just a matter of multiplying the coef_ by covariance of the design. patterns = np.dot(X.T.dot(X), coef_) maybe you can give it a try? @larsoner and me will offer feedback for sure. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3884 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEp7DIU6Z21d2mFAiCpFHfqfMVD8xK9Lks5se_aXgaJpZM4LcIiW> .

agramfort · 2017-09-04T13:19:58Z

thanks for giving it a try

kingjr · 2017-09-04T13:20:27Z

I agree it doesn't seem that hard to implement. The only issue that I foresee might be to deal with the intercept before np.multi_dot([np.dot(X.T, X), coef_, np.inv(np.dot(y.T, y))]) (not sure)? But I wasn't sure that was something you'd find interesting/useful at all.

+1 for having that feature computed upon request.

choldgraf · 2017-09-04T16:55:18Z

@nbara would you like to give it a shot? Feel free to ping me for a review if you like.

Quick questions:

At the call to .fit, should we store the covariance matrices of X any y. (or maybe just store X and y and compute covar later?)
Should this be a lazy attribute? E.g., have an inverse_coefs_ attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?

larsoner · 2017-09-04T17:14:29Z

At the call to .fit, should we store the covariance matrices of X any y. (or maybe just store X and y and compute covar later?)

We should store the X covariance, it is the time-consuming computation IIRC

Should this be a lazy attribute? E.g., have an inverse_coefs_ attribute that does the matrix computation necessary and errors if the model hasn't been fit yet?

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

choldgraf · 2017-09-04T17:23:06Z

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

Yeah I think so too. Can we think of edge cases where we'd regret this? E.g. gigantic X matrices? Could be a silent memory issue if the object is suddenly storing an N x N matrix if N is really large...

kingjr · 2017-09-04T17:29:32Z

I would recommend to add the feature, and make it optional once we identify a clear usecase where memory is a problem.

…

On 4 September 2017 at 13:23, Chris Holdgraf ***@***.***> wrote: We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think. Yeah I think so too. Can we think of edge cases where we'd regret this? E.g. gigantic X matrices? Could be a silent memory issue if the object is suddenly storing an N x N matrix if N is really large... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3884 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEp7DLnk4yD7nN6a8Mws-lRDpSiS-B0Vks5sfDH7gaJpZM4LcIiW> .

larsoner · 2017-09-04T17:45:48Z

E.g. gigantic X matrices?

The covariance (what we'd need to store) is just (n_feat, n_feat) and X is (n_samp, n_feat) with n_samp >> n_feat -- so if you're able to compute the fit at all you probably won't be hurt too much by storing the cov. It's still possible (especially if we start to be smart about mem usage with the computations) to hit memory but I agree with @kingjr we can deal with that when we hit it (and hope/assume YAGNI).

choldgraf · 2017-09-04T17:48:28Z

that's fine w/ me - just bringing it up as worth thinking about. My concern is more that users won't necessarily think that this will happen intuitively, so it's silently storing an N x N matrix in each model object...what if somebody doesn't know about this and is doing cross-validation with 100s of models, storing the object each time? I think it'd be fine to solve this with a good mention in the docstring for now tho.

nbara · 2017-09-04T17:51:56Z

We should store the X covariance, it is the time-consuming computation IIRC

Actually no, ...

We should just compute it when doing fit(...). It will be a very small cost relative to the fitting operation I think.

... from what I can tell, if one chooses TimeDelayingRidge, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays) in fit. However, the data needs to be in that shape before it's multiplied with the covariances.

I've just given a jab at this. Here's a gist with a function that does the job (not fully tested yet but seems to give sensible weights).
https://gist.github.com/nbara/c0f2f05ad5db1d1710e591ce89d7b645

It's not lightweight at all since I've got to _delay_time_series and _reshape_for_est before the final dot product...

PS: Feel free to comment on the gist if you don't want to bloat this thread.

larsoner · 2017-09-04T18:02:33Z

from what I can tell, if one chooses TimeDelayingRidge, then X is not reshaped into (n_times, n_epochs, n_feats, n_delays) in fit. However, the data needs to be in that shape before it's multiplied with the covariances.

It probably does not need to be reshaped. See how predict etc. are done -- the equivalence between convolution and matrix multplication are used instead.

The best thing to do is probably to open a WIP PR, rather than a gist. Let's get something that works properly with unit tests, then worry about addressing the bottlenecks.

choldgraf · 2017-09-08T21:50:19Z

any movement on this? @nbara wanna give a shot at a PR?

nbara · 2017-09-09T15:53:40Z

Yes I'm happy to try, but it's likely that I'll be be needing some help at some point as I'm not sure a) how to make the code work for TimeDelayingRidge and b) what a proper unit test for this would be.

choldgraf · 2017-09-09T16:11:04Z

Happy to help out :-)

…

On Sat, Sep 9, 2017, 8:53 AM Nicolas Barascud ***@***.***> wrote: Yes I'm happy to try, but it's likely that I'll be be needing some help at some point as I'm not sure a) how to make the code work for TimeDelayingRidge and b) what a proper unit test for this would be. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3884 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABwSHYScUvZDDx2HW3tU_08eYpQyAkf6ks5sgrSFgaJpZM4LcIiW> .

nbara · 2017-09-10T19:59:21Z

Ok starting to work on it now. Are we sure we want to have this computed by default in fit? Because I don't think it would be terribly useful for forward models (why would one want to backward-transform forward weights?)

choldgraf · 2017-09-10T20:14:37Z

sounds good - I think it would need to be computed in fit...that's the only situation in which the object would have access to the X and the y variables, no?

choldgraf · 2017-09-10T20:15:30Z

I guess another option would be to have a special function for this, e.g.,:

mne.decoding.inverse_transform(coefs, X, y)

agramfort · 2017-09-10T20:19:51Z

what's the cost compared to estimation of coef_ ? if it's small let's compute it all the time.

nbara · 2017-09-10T20:30:25Z

I guess another option would be to have a special function for this, e.g.,:

mne.decoding.inverse_transform(coefs, X, y)

Yeah, something like this. Initially I imagined this becoming a class method, such as rf.inverse_coefs(X, y). In my mind this was not something that every user would need and use often, it's mostly a convenience for those who do backward modelling.

what's the cost compared to estimation of coef_ ? if it's small let's
compute it all the time.

Depends how well I manage to code this :)
More seriously, I need to work on this a bit more before I can answer this. In principle this could be pretty lightweight, but I need to take a closer look at TimeDelayingRidge to be sure.

kingjr · 2017-09-10T21:00:18Z

I don't have the exact usescases and constraints in mind, but just to be sure we are on the same line of thoughts, here is what we do for classic decoding analyses:

# don't compute patterns if not necessary
clf = Ridge()
clf.fit(X, y)

to get the patterns

clf = mne.decoding.LinearModel(Ridge())
clf.fit(X, y)
clf.patterns_

to have this in a pipeline

clf = make_pipeline(StandardScaler(), LinearModel(Ridge())
clf.fit(X, y)
mne.decoding.get_coef(clf, 'patterns_', inverse_transform=True)

And with mne wrapper compatibility e.g

clf = mne.decoding.SlidingEstimator(LinearModel(Ridge()))
clf.fit(X, y)
get_coef(clf, 'patterns_')

In other words, we classically don't compute the patterns (cov etc) at fitting; if users want to have the patterns, they use the LinearModel additional step, and the get_coef to have interpretable coefficients when multiple preprocessing steps are used.

It would be great to have an analogous API.

choldgraf · 2017-09-10T21:45:14Z

@kingjr so the idea is that the only thing that LinearModel does is expose a patterns_ attribute upon the call to fit? If the computation is non-trivial, then I'd be fine with doing that. If it is relatively trivial (in terms of time), then I think it's also fine to just compute it at fit...

nbara · 2017-09-10T21:52:28Z

@choldgraf Just ran a few quick tests. Using your tutorial data, I spend an extra ~1.3 seconds in fit to compute the inverse coefs on my macbook. (note that I only got it to work with estimator=Ridge() yet and it might take more/less with TimeDelayingRidge)

larsoner · 2017-09-10T22:48:35Z

If the current code issues Ridge, the auto and cross covariances need to be recalculated, so it will add time. With TDR (or a custom Ridge) where these are stored I doubt it will add much time

agramfort · 2017-09-11T07:14:02Z

I am fine exposing a patterns_ attribute in term of API

choldgraf · 2017-09-11T15:45:20Z

@agramfort would patterns_ only be for the inverted coefficients? In that case, is the word "patterns" standard in the eeg/meg decoding world? Just wanna make sure we're being semantically consistent here...

agramfort · 2017-09-11T15:51:39Z

yes patterns_ is standard these days. fillters_ is semantically the same as coef_

choldgraf · 2017-09-11T15:56:09Z

so

coefs_ == filters_ == "receptive field"

and

patterns_ == "electrode weights"

?

I'm +1 on calling it patterns_ in this case. I have a slight preference for it just storing the covariance matrices and computing patterns_ lazily when people ask for it. Though I also think @kingjr 's approach is pretty clever (though it complicates the API a bit).

Can we think of any other situation where we'd want to do this set of operations? (storing covariance of X/y in the call to fit, and then using them to invert coefficients). If so, then I think that's a good case to make this another class that can wrap around ReceptiveField in the way JR describes above.

agramfort · 2017-09-11T18:06:01Z

whatever works but KISS

larsoner · 2017-09-12T15:36:03Z

@choldgraf @nbara I've been thinking about it and I think it makes more sense to use positive lags in the forward/encoding (stimulus->brain) models than negative lags. For example when X is speech and y is brain/EEG (which it is in both of our examples), the system h in the equation y = X * h needs to be active in positive/causal lags. But in both of our examples we do it the other way. Even the last STRF plot in this example is time-reversed relative to the one in the paper.

It's not too late to fix this as we haven't released. WDYT?

choldgraf · 2017-09-12T15:57:11Z

the way I think about it is in terms of experiment time. Where do those features in X exist relative to each timepoint in y. In an encoding model, they are negative since they happen at t - N timepoints relative to y.

I think the problem is that when we call them "time delays" (or lags), we're not saying "where relative to y does this feature of X exist". We're saying "what is the value of N" (in the equation above).

Personally, I think it's more intuitive to use negative numbers to represent "things that happen in the past", but if the field has standards around positive numbers, I'm not going to bikeshed this one. I can see a case for both.

I don't wanna get into a situation like EEG has where half of their plots have the y-axis flipped, that is just the dumbest thing ever. :-)

larsoner · 2017-09-12T16:05:30Z

if the field has standards around positive numbers

I think the standard might be to use positive numbers for causal behavior in forward models. Do you agree? I am no expert here. The Lalor paper seems to present their forward model that way, at least.

larsoner · 2017-09-12T16:07:05Z

It might be that by convention STRF plots are sometimes (probably half the time :) ) shown with negative lags, but in terms of the mathematics of the model being convolution of a signal (speech) with a kernel (STRF) to get an output (brain activation), the lags should be positive I think.

choldgraf · 2017-09-12T16:17:57Z

I am not really interested in what is technically correct, as much as what is intuitively correct to most people (the vast majority of which don't really know what convolution is mathematically).

Re: standards, here's a quick lit summary:

the chang nature paper used positive lags
this truly excellent nature communications paper used positive time lags
this seminal paper in STRF fitting in animals uses positive time lags
this cool paper from S.David uses positive time lags.

so in short, I think you're right, we should use positive time lags :-)

choldgraf · 2017-09-12T16:18:47Z

(FWIW, my initial intention was to find papers that used negative lags in their plots, but it was actually kind of difficult...including for my own paper...which tells me that I'm probably wrong in my intuition haha)

larsoner · 2017-09-12T16:19:32Z

Okay, I'll mark this one for 0.15 so we change the tmin/tmax def assuming nobody else has objections. I should be able to do it this week.

rkmaddox · 2017-09-12T16:19:50Z

+1 for positive lags. If you think of the TRF (no S) when a stimulus is a bunch of clicks (so, x = click train, y = EEG signal), then the brain response should be at positive lags, because a click causes the brain signal to wobble after the click happens.

I agree that it gets a little hairier with the STRF, but positive lags still make by far the most sense to me.

larsoner · 2017-09-12T16:20:43Z

(And sorry in advance to any early dev adopters who might have code that will break due to this change.)

choldgraf · 2017-09-12T16:21:32Z

@larsoner this was a good catch before 0.15...it would have been way more annoying to fix this after users had started incorporating it using the pip version. IMO when it's on master anything goes in terms of breaking changes :-)

nbara · 2017-09-12T16:24:37Z

+1 for positive lags as well

larsoner · 2017-09-12T16:25:50Z

@nbara before you put too much time/effort into updating an example in #4546 (which I just bothered you about) let me take care of the lags. Because yours should then be reversed / negative while plotting.

choldgraf · 2017-09-12T16:28:42Z

looks like we're on the same page...in that case: https://youtu.be/U3HMZsDioB8?t=4m17s

agramfort · 2017-09-12T19:21:53Z

good we don't release often enough :)

larsoner · 2018-08-18T21:17:23Z

@nbara are you still using ReceptiveField much? Any interest in adding the inverse transform bits? Sounds like we have converged on setting it to patterns_. I say we just compute it when doing fit since it should be a small cost compared to the other steps.

larsoner · 2018-08-18T21:32:18Z

... oh wait, looks like the patterns_ part is already done!

nbara · 2018-08-19T09:40:26Z

:) yup ! I even added a small section in the tutorial for this

I haven't used RF much lately but one feature I remember missing is the ability to subsample the lags/delays, or to provide a list of lags instead of tmin and tmax. Of course you can always downsample your data but it would be more convenient to have this feature directly in ReceptiveField.

larsoner mentioned this issue Jan 5, 2017

[MRG+2] adding receptive field module #3728

Merged

larsoner modified the milestone: 0.15 Jan 6, 2017

larsoner added the mne sprint label Jan 6, 2017

larsoner removed this from the 0.15 milestone Aug 4, 2017

nbara mentioned this issue Sep 11, 2017

WIP: Inverse-transform ReceptiveField #4546

Merged

4 tasks

choldgraf mentioned this issue Sep 11, 2017

MRG: fill_value in ReceptiveField #4543

Merged

4 tasks

larsoner added this to the 0.15 milestone Sep 12, 2017

larsoner removed this from the 0.15 milestone Sep 13, 2017

larsoner mentioned this issue Aug 18, 2018

ENH: Use CUDA for ReceptiveField #5440

Closed

larsoner removed the mne sprint label Aug 18, 2018

larsoner mentioned this issue Aug 20, 2019

Add a bit of support for categorical data to TimeDelayedRidge? #4940

Closed

ENH: More features for ReceptiveField #3884

ENH: More features for ReceptiveField #3884

Comments

larsoner commented Jan 5, 2017 • edited Loading

nbara commented Sep 4, 2017 • edited Loading

agramfort commented Sep 4, 2017 via email

nbara commented Sep 4, 2017 • edited Loading

kingjr commented Sep 4, 2017 via email

agramfort commented Sep 4, 2017 via email

kingjr commented Sep 4, 2017

choldgraf commented Sep 4, 2017

larsoner commented Sep 4, 2017

choldgraf commented Sep 4, 2017

kingjr commented Sep 4, 2017 via email

larsoner commented Sep 4, 2017

choldgraf commented Sep 4, 2017

nbara commented Sep 4, 2017

larsoner commented Sep 4, 2017

choldgraf commented Sep 8, 2017

nbara commented Sep 9, 2017

choldgraf commented Sep 9, 2017 via email

nbara commented Sep 10, 2017

choldgraf commented Sep 10, 2017

choldgraf commented Sep 10, 2017

agramfort commented Sep 10, 2017 via email

nbara commented Sep 10, 2017

kingjr commented Sep 10, 2017

choldgraf commented Sep 10, 2017

nbara commented Sep 10, 2017 • edited Loading

larsoner commented Sep 10, 2017 via email

agramfort commented Sep 11, 2017 via email

choldgraf commented Sep 11, 2017

agramfort commented Sep 11, 2017 via email

choldgraf commented Sep 11, 2017

agramfort commented Sep 11, 2017 via email

larsoner commented Sep 12, 2017

choldgraf commented Sep 12, 2017 • edited Loading

larsoner commented Sep 12, 2017

larsoner commented Sep 12, 2017

choldgraf commented Sep 12, 2017

choldgraf commented Sep 12, 2017 • edited Loading

larsoner commented Sep 12, 2017

rkmaddox commented Sep 12, 2017

larsoner commented Sep 12, 2017

choldgraf commented Sep 12, 2017

nbara commented Sep 12, 2017

larsoner commented Sep 12, 2017

choldgraf commented Sep 12, 2017

agramfort commented Sep 12, 2017 via email

larsoner commented Aug 18, 2018

larsoner commented Aug 18, 2018

nbara commented Aug 19, 2018

larsoner commented Jan 5, 2017 •

edited

Loading

nbara commented Sep 4, 2017 •

edited

Loading

nbara commented Sep 4, 2017 •

edited

Loading

nbara commented Sep 10, 2017 •

edited

Loading

choldgraf commented Sep 12, 2017 •

edited

Loading

choldgraf commented Sep 12, 2017 •

edited

Loading