MRG: Refactor Xdawn #3425

kingjr · 2016-07-15T23:15:08Z

Refactor: Xdawn and XdawnTransformer classes, now both in mne.preprocessing.xdawn
FIX: bug weird channels picking in tests and Xdawn class
FIX: unstable unit test
FIX: remove undocumented attributes (e.g. Xdawn.exclude), add and document necessary ones (Xdawn.overlap_, evokeds_, .event_id_)
ENH: _xdawn_fit private function for more explicit args.
FIX: XdawnTransformer devoid of any fancy objects (Evoked / dictionary etc), and only uses numpy

cc @kaichogami @alexandrebarachant @dengemann

jona-sassenhagen · 2016-07-16T00:43:36Z

mne/preprocessing/xdawn.py

+
+    # Get prototype events
+    if events is not None:
+        evokeds, toeplitzs = _least_square_evoked(epochs, events, tmin, sfreq)


Btw this (that is, _least_square_evoked) is where an encoder model backbone would fit in.
toeplitz is an encoder's X, evokeds are the coefs of the fitted model/the estimated rERPs.

(As @alexandrebarachant has noted a few times, originally, XDawn is applied to continuous data, and this internally re-creates the continuous data.)

Good, I was actually unclear what toeplitz was :/

Can you suggest a snippet to incorporate the encoding model here in a subsequent PR?

With the current code, it would be something like

from mne.stats.regression import _prepare_rerp_preds, _make_evokeds from mne.preprocessing.xdawn import _construct_signal_from_epochs from sklearn.linear_model import Ridge # function would have to be modified to return events) raw_, events_ = _construct_signal_from_epochs( events, epochs.info["sfreq"], tmin, tmax, epochs.get_data()) X, conds, cond_length, tmin_s, tmax_s = _prepare_rerp_preds( raw_.n_times, raw_.info["sfreq"], events_, event_id, tmin, tmax) coefs = Ridge(alpha=0).fit(X, raw_._data).coefs_ evokeds = _make_evokeds(coefs, conds, cond_length, tmin_s, tmax_s, raw_.info)

Hm. Doesn't look that hard actually.
I think I'll try finally doing this (1 year after I opened the issue) once this one is merged.

To build the evokeds, you need some information about the delays/estimated response functions per condition to extract the data from coefs. So if you use a different mechanism to build X/toeplitzs, that will have to be taken care of.

_prepare_rerp_preds would have to be modified to optionally return the unstacked X, because XDawn uses the stacked X in _least_square_evoked, but XDawn also needs them unstacked later. Alternatively, I think X can be memory efficiently reconstructed on the fly, not sure.

Okay, I think this would work:

from mne.stats.regression import _prepare_rerp_preds, _make_evokeds from mne.preprocessing.xdawn import _construct_signal_from_epochs from sklearn.linear_model import Ridge from scipy import sparse raw_ = _construct_signal_from_epochs( events, epochs.info["sfreq"], tmin, tmax, epochs.get_data()) X, conds, cond_length, tmin_s, tmax_s = _prepare_rerp_preds( raw_.n_times, raw_.info["sfreq"], mne.find_events(raw_), event_id, tmin, tmax, stack=False) coefs = Ridge(alpha=0).fit(sparse.hstack(X), raw_._data).coefs_ evokeds = _make_evokeds(coefs, conds, cond_length, tmin_s, tmax_s, raw_.info) toeplitzs = [t_.toarray() for t_ in X]

Would be a 2 line change to _make_evokeds, and allow us to remove about 100 lines of code from XDawn and around 20 lines of code from linear_regression_raw tests.

Would be a 2 line change to _make_evokeds, and allow us to remove about 100 lines of code from XDawn and around 20 lines of code from linear_regression_raw tests.

Nice, shall I try it in this PR, or does it require some other code?

Let me do it. I've been postponing this for much too long.

Can you open an Issue so that we don't forget, I won't handle this in this PR

There already is one: #2332

codecov-io · 2016-07-16T01:31:00Z

Current coverage is 86.73% (diff: 95.41%)

Merging #3425 into master will increase coverage by <.01%

@@             master      #3425   diff @@
==========================================
  Files           339        337     -2   
  Lines         58637      58592    -45   
  Methods           0          0          
  Messages          0          0          
  Branches       8928       8920     -8   
==========================================
- Hits          50856      50821    -35   
+ Misses         5092       5085     -7   
+ Partials       2689       2686     -3

Powered by Codecov. Last update e17e31f...4a8b1c6

kingjr · 2016-07-16T04:39:51Z

@Eric89GXL I'm not sure I did the deprecation correctly could you check?

@alexandrebarachant I incorporated the overlap correction here. Do you think it should be added direclty to pyRiemann?

@kaichogami @dengemann @jona-sassenhagen Could you give it a review API wise?

kingjr · 2016-07-16T04:44:42Z

here is a snippet to test the new API:

import matplotlib.pyplot as plt
import numpy as np
from mne.preprocessing import XdawnTransformer
xd = XdawnTransformer(n_components=3)
n_trial, n_chan, n_time = 100, 20, 10
X = np.random.randn(n_trial, n_chan, n_time)
y = np.random.randint(0, 2, n_trial)
xd.fit(X, y)
Xt = xd.transform(X)
X_denoised = xd.inverse_transform(Xt)
plt.matshow(X_denoised.mean(0))
plt.show()

agramfort · 2016-07-16T06:55:34Z

examples/preprocessing/plot_xdawn_denoising.py

+# Plot in sensor space
+X_denoised = xd.inverse_transform(Xt)
+epochs_denoised = epochs.copy()
+epochs_denoised._data[test] = X_denoised


I prefer if you use EpochsArray rather than hacking private attributes.

yes. I'm actually wondering whether I shouldn't just plot the data, since we're not using any particular topography, might as well keep it simple, WDYT?

larsoner · 2016-07-16T11:11:10Z

I think that the decorator can work directly on a class like that. But just try instantiating the class, that will tell you if it's working.

kaichogami · 2016-07-16T13:39:34Z

mne/preprocessing/xdawn.py

-    return evokeds
+    n_components : int (default 2)
+        The number of components to decompose the signals.
+    reg : float | str | None (default None)


Brackets around None.

what do you mean @kaichogami , the rest of the doc doesn't seem to have bracket?

It does, but @dengemann suggested it should be written in normal text and I changed it accordingly there. Or am I missing something here?

kaichogami · 2016-07-16T13:52:36Z

I think its a good idea to keep one Xdawn algorithm instead of dividing them into two. However I am not sure about the third parameter in fit. Didn't we write the Xdawntransformer instead of changing Xdawn as we were not passing events which resulted overlap not being supported ?

agramfort · 2016-07-16T14:47:35Z

I think its a good idea to keep one Xdawn algorithm instead of dividing
them into two. However I am not sure about the third parameter in fit.
Didn't we write the Xdawntransformer instead of changing Xdawn as we were
not passing events which resulted overlap not being supported ?

well you can have the same code but expose it using private class or
functions as 2 classes dedicated to distinct usage. You don't have to
duplicate any code.

jona-sassenhagen · 2016-07-17T16:30:04Z

mne/preprocessing/tests/test_xdawn.py

@@ -77,6 +78,7 @@ def test_xdawn_fit():
    assert_raises(ValueError, xd.fit, epochs)
    # fit with baseline correction and ovverlapp correction should throw an
    # error
+    # XXX This is a buggy test, the epochs here don't overlap


FWIW the EEGLAB example data has overlapping events.

Thx. We have to simulate however, else the tests are slower, no?

kingjr · 2016-07-17T23:43:57Z

Overall, we could indeed keep a single Xdawn class. The motivation for creating XdawnTransformer was the necessity for a deprecation cycle so as to be able to change the API.

Specifically,

we needed xd.filters_ and xd.patterns_ to be stored as numpy arrays instead of dicts (as the dicts depends on unnecessary event_id params), and
we do not want to store xd.evokeds_ which involve complex Evoked objects, and are uncessary for the Xdawn computations.

The latter points needs not creating a new class, but I believe that the former does.

Overall, I think we have multiple non-exclusive options

Keep a Xdawn and XdawnTransformer class, which will respectively be deprecated, and become the standard API.
Make XdawnTransformer accepts both epochs and numpy arrays: xd.fit(epochs) # overlap and xd.fit(X, y) #no overlap. The former would allow extracting the events, but not the latter.
Make XdawnTransformer accepts numpy arrays only, and pass an additional events param at fitting time (the current solution), although this is not really conventional in sklearn.

cc @agramfort @kaichogami @dengemann

kingjr · 2016-07-19T16:49:22Z

examples/preprocessing/plot_xdawn_denoising.py

 xd.fit(epochs)

-# Denoise epochs
+# Denoise epochs. Note that it outputs a numpy array


kingjr · 2016-07-19T17:56:09Z

I tried to successively implement three different API, and came to the conclusion to stick to the 3rd one:

Only having an Xdawn class accepting both numpy arrays and epoch, and no XdawnTransformer. This implementation becomes a nightmare to integrate with sklearn,

because filters_ and patterns_ are dict whose keys depend on event_id, which would thus need to pass as extra param in the init.
correct_overlap would only work for epochs. IMO this is bad design because the function would vary as function of input type.

XdawnTransformer accepting both epochs and numpy arrays: i.e. xd.fit(epochs) or xd.fit(X, y): This implementation is complex as we would need to

pick the data channels and exclude the stim, eog etc
check that its the same picking at fit() and transform()
input epochs but still output numpy arrays, which is weird. We could add an output='epochs' parameter, but it doesn't seem the right way to go.

XdawnTransformer only accepting numpy arrays: xd.fit(X, y), and keeping the current Xdawn. The main fitting function is currently private, but could be made public, so that advanced users can apply an overlap correction.

Additionally, I:

corrected a few bugs in the Xdawn class which was storing unnecessary or undocumented attributes (xd.overlap_, xd.event_id_ and remove xd.exclude).
removed the mne/decoding/xdawn.py and put both Xdawn and XdawnTransformer into mne/preprocessing/xdawn.py

However, I did not completely refactored the Xdawn class. The apply() code is already quite messy and I would be strongly in favour of deprecating the class entirely as:

Xdawn is applied across different channels types, without rescaling, which doesn't seem right to me. Different channel have different scalings, and will thus differentially contribute to the eigen values.
Xdawn fits the maximum number of components even if n_components is low, but removes the excluded components at apply(). This API does not follow the sklearn standard of setting the parameters at init(), and is not computationally efficient. I understand that this is for user to try to see how many components they should remove to clean their data, but this is really pushing for ad-hoc parameterizations and overfitting.
The behavior of Xdawn.apply(exclude=exclude) is not very clear to me. The code suggests that the excluded components are not solely defined by n_components (defined at init) or the optional exclude parameter (defined at apply), but by a combination of the two.

So, ATM, I have not deprecated Xdawn as I suspect @agramfort wants a high level class, but I can already say that I don't think I won't be the one supporting such high level beast in the future.

larsoner · 2016-07-20T19:35:29Z

@kingjr do you see this as being mergeable today? If not, is there any way you could split out the fix or failing master builds (#3433) into a separate PR?

kingjr · 2016-07-20T19:36:34Z

I'm only waiting for reviews

jona-sassenhagen · 2016-07-20T19:38:25Z

examples/preprocessing/plot_xdawn_denoising.py

@@ -76,5 +75,5 @@
 # Denoise epochs
 epochs_denoised = xd.apply(epochs)

-# Plot image epoch after xdawn
+# Plot


Why not a more descriptive one?

larsoner · 2016-07-20T19:40:11Z

Can you set to MRG then?

To ask a slightly different, question do you expect this to be merged today? Waiting for reviews and API comments could mean several more iterations, and more days of waiting, especially since current holiday time might make the review lag long. Based on the length of your last comment, I'm assuming this will be the case, no?

In the meantime we should try to get master passing ASAP -- is it easy to split off a simple fix for it?

jona-sassenhagen · 2016-07-20T19:42:40Z

mne/preprocessing/tests/test_xdawn.py

-    # with this parameters, the overlapp correction must be False
-    assert_equal(xd.correct_overlap, False)
-    # no overlapp correction should give averaged evoked
+    # With this parameters, the overlap correction must be False


these parameters

kingjr · 2016-07-20T19:51:38Z

ok, I'll try to give a quick fix to the example in a separate PR.

jona-sassenhagen · 2016-07-20T19:56:06Z

mne/preprocessing/xdawn.py

+    # Contruct raw signal
+    raw = _construct_signal_from_epochs(epochs_data, events, sfreq, tmin)
+
+    # Compute average evoked


I'm not sure that's the correct term considering the overlap correction. Maybe more something like "Compute the independent evoked responses per condition, correcting for overlap".

(No evokeds are ever averaged, and no averaging is actually done.)

kingjr · 2016-07-24T14:36:27Z

@Eric89GXL it's a linalg error in the xdawn. I don't manage to replicate it locally. Let's see whether setting the random seed fixes it.

EDIT: no, that didn't work :/

larsoner · 2016-07-24T16:57:08Z

If not, I recently added the buggy_mkl test decorator

kingjr · 2016-07-25T01:27:55Z

@Eric89GXL I'm stuck. I installed python 2.6 with scipy 11.0, numpy 1.7 etc, and I can't reproduce the linalg error. Any chance you could have a look?

larsoner · 2016-07-25T03:07:09Z

It is CPU architecture dependent use the decorator

agramfort · 2016-07-25T08:18:00Z

@kingjr debug and use mkl debug decorator for the tests

kingjr · 2016-07-25T08:31:22Z

@kingjr debug

Come on, that s not really helpful.

I'm already using the mkl decorator. See L88 of the test.

dengemann · 2016-07-25T08:32:44Z

Sorry was offline, I will have time to review today and tomorrow.
On Mon, 25 Jul 2016 at 10:31, Jean-Rémi KING [email protected]
wrote:

@kingjr https://github.com/kingjr debug

Come on, that s not really helpful.

I'm already using the mkl decorator. See L88 of the test.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3425 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AB0finZDR1bqAFi9Fe0oVoRAaMyn1Pxaks5qZHRbgaJpZM4JN2vQ
.

larsoner · 2016-07-25T14:08:49Z

Your error is not the SVD error that I saw before, so the MKL decorator will not help. It is a different linear algebra error:

https://travis-ci.org/mne-tools/mne-python/jobs/147028330#L2833

It looks like it might be suggesting that your (covariance?) matrix is not positive definite. Are there zero eigenvalues? If so, it's not positive definite but positive semidefinite.

I googled the error and first result was something very similar:

scikit-learn/scikit-learn#2640

I wonder if you should use some other linear algebra function like svd instead of eigh, since SVD won't care about the zero or negative eigenvalues.

larsoner · 2016-07-25T14:17:22Z

Without looking at the code, I'd assume that it's happening on element 60 because it's EEG data with 60 channels and an avg ref applied, i.e. of rank 59? That's one way this could happen.

kingjr · 2016-07-25T15:21:50Z

I can relook into it, but I'm surprised it only crashes in one of the python job

larsoner · 2016-07-25T15:24:33Z

Slight numerical differences can crop up only sometimes. In this case they probably cause one of the eigenvalues to go to like -1e-16 or actual 0 instead of (in most cases) +1e-16. Or maybe scipy improved their check for numerical precision in the meantime, so you're hitting a version issue. In either case, it probably warrants further investigation unfortunately.

kingjr · 2016-07-25T15:26:35Z

@Eric89GXL ok thanks. I've tried to use the same package versions as the failing job, without success. Let's see whether the explicit non reference makes it.

larsoner · 2016-07-25T22:59:51Z

Looks like it passed other than the unrelated flake8 package error

agramfort · 2016-07-26T20:05:07Z

@kingjr ping us when CIs are happy

kingjr · 2016-07-26T20:09:02Z

@agramfort IIUC Cis are happy, it's a non related flake8 bug that made it crash.

larsoner · 2016-07-26T21:29:20Z

I fixed flake last night, just restarted the build

FIX: docs FIX: error overlap Update xdawn example FIX: inverse transform dimensionality FIX: remove old decoding files FIX: EpochsArray in example, back to not cv address comments docs precise example doc ENH: XdawnTransformer accepts Epochs ENH: allow XdawnTransformer epochs Back to XdawnTransformer not accepting epochs ENH: add tests transform, inv_transform size FIX: extra Xdawn attributes mess

larsoner · 2016-07-27T02:52:23Z

@agramfort confirmed CIs are happy

agramfort · 2016-07-27T08:41:12Z

thx @kingjr

* FIX: original tests * REFACTOR: original xdawn * REFACTOR: XdawnTransformer FIX: docs FIX: error overlap Update xdawn example FIX: inverse transform dimensionality FIX: remove old decoding files FIX: EpochsArray in example, back to not cv address comments docs precise example doc ENH: XdawnTransformer accepts Epochs ENH: allow XdawnTransformer epochs Back to XdawnTransformer not accepting epochs ENH: add tests transform, inv_transform size FIX: extra Xdawn attributes mess * FIX: examples with XDawn instead of XdawnTransformer * FIX: address comments * FIX: explicit no ref in EEG

jona-sassenhagen reviewed Jul 16, 2016
View reviewed changes

agramfort reviewed Jul 16, 2016
View reviewed changes

kaichogami reviewed Jul 16, 2016
View reviewed changes

jona-sassenhagen reviewed Jul 17, 2016
View reviewed changes

kingjr reviewed Jul 19, 2016
View reviewed changes

This was referenced Jul 19, 2016

BUG: examples/decoding/plot_decoding_xdawn_transformer.py fails on master #3433

Closed

ENH: decoding module 2017 #3442

Open

jona-sassenhagen reviewed Jul 20, 2016
View reviewed changes

kingjr force-pushed the xdawn_clean branch from 1093a42 to 390d7c1 Compare July 24, 2016 18:16

larsoner added this to the 0.13 milestone Jul 25, 2016

kingjr added 6 commits July 26, 2016 17:58

FIX: original tests

5730b6f

REFACTOR: original xdawn

381bab8

FIX: examples with XDawn instead of XdawnTransformer

b2dbd24

FIX: address comments

67892a4

FIX: explicit no ref in EEG

4a8b1c6

kingjr force-pushed the xdawn_clean branch from 7b91a37 to 4a8b1c6 Compare July 26, 2016 22:00

agramfort merged commit 8823240 into mne-tools:master Jul 27, 2016

jona-sassenhagen mentioned this pull request Sep 2, 2016

WIP: refactor XDawn to use rERP #3563

Closed

MRG: Refactor Xdawn #3425

MRG: Refactor Xdawn #3425

Conversation

kingjr commented Jul 15, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jona-sassenhagen Jul 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jona-sassenhagen Jul 17, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jul 16, 2016 • edited Loading

Current coverage is 86.73% (diff: 95.41%)

kingjr commented Jul 16, 2016

kingjr commented Jul 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larsoner commented Jul 16, 2016 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaichogami commented Jul 16, 2016

agramfort commented Jul 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kingjr commented Jul 17, 2016

Choose a reason for hiding this comment

kingjr commented Jul 19, 2016

larsoner commented Jul 20, 2016

kingjr commented Jul 20, 2016

Choose a reason for hiding this comment

larsoner commented Jul 20, 2016

Choose a reason for hiding this comment

kingjr commented Jul 20, 2016

Choose a reason for hiding this comment

kingjr commented Jul 24, 2016 • edited Loading

larsoner commented Jul 24, 2016 via email

kingjr commented Jul 25, 2016

larsoner commented Jul 25, 2016 via email

agramfort commented Jul 25, 2016

kingjr commented Jul 25, 2016

dengemann commented Jul 25, 2016

larsoner commented Jul 25, 2016

larsoner commented Jul 25, 2016

kingjr commented Jul 25, 2016

larsoner commented Jul 25, 2016

kingjr commented Jul 25, 2016

larsoner commented Jul 25, 2016

agramfort commented Jul 26, 2016

kingjr commented Jul 26, 2016

larsoner commented Jul 26, 2016

larsoner commented Jul 27, 2016

agramfort commented Jul 27, 2016

kingjr commented Jul 15, 2016 •

edited

Loading

jona-sassenhagen Jul 17, 2016 •

edited

Loading

jona-sassenhagen Jul 17, 2016 •

edited

Loading

codecov-io commented Jul 16, 2016 •

edited

Loading

kingjr commented Jul 24, 2016 •

edited

Loading