Some minor inconsistency in the paper #5

radarFudan · 2023-10-26T10:33:54Z

According to your arxiv paper (https://arxiv.org/pdf/2310.12109.pdf), there is no activation in sequence mixing in formula (2). However, the appendix includes the code for MonarchMixerLayer and it includes the ReLU in the sequence mixing layer.

radarFudan · 2023-10-26T10:38:53Z

Plus, equation (3) seems to be an MLP operation (I might be wrong). I don't fully understand why it is said to be MLP-free in "The resulting architecture is entirely attention- and MLP-free."

DanFu09 · 2023-10-26T19:16:59Z

Thanks for your questions!

it includes the ReLU in the sequence mixing layer

This is a typo - we used to say that the sequence mixing had an "optional" activation function that we would set to identity for the sequence mixer. We updated the equation but not the pseudocode -- will fix it the next time we update the arXiv!

equation (3) seems to be a MLP operation

Ah, this is helpful feedback! The distinction we intend to say is that an MLP is quadratic in $d$, and the M2 version is sub-quadratic.

Specifically -- in an MLP, there are linear layers that take quadratic compute ($O(d^2)$ for dimension $d$). In M2, we replace these linear layers with Monarch matrices, which can be computed in sub-quadratic time. So it has the similar structure of an MLP, but is sub-quadratic in $d$.

We'll clarify the language in the next arXiv update, thank you!

radarFudan closed this as completed Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some minor inconsistency in the paper #5

Some minor inconsistency in the paper #5

radarFudan commented Oct 26, 2023

radarFudan commented Oct 26, 2023 •

edited

Loading

DanFu09 commented Oct 26, 2023

Some minor inconsistency in the paper #5

Some minor inconsistency in the paper #5

Comments

radarFudan commented Oct 26, 2023

radarFudan commented Oct 26, 2023 • edited Loading

DanFu09 commented Oct 26, 2023

radarFudan commented Oct 26, 2023 •

edited

Loading