Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some minor inconsistency in the paper #5

Closed
radarFudan opened this issue Oct 26, 2023 · 2 comments
Closed

Some minor inconsistency in the paper #5

radarFudan opened this issue Oct 26, 2023 · 2 comments

Comments

@radarFudan
Copy link

According to your arxiv paper (https://arxiv.org/pdf/2310.12109.pdf), there is no activation in sequence mixing in formula (2). However, the appendix includes the code for MonarchMixerLayer and it includes the ReLU in the sequence mixing layer.

@radarFudan
Copy link
Author

radarFudan commented Oct 26, 2023

Plus, equation (3) seems to be an MLP operation (I might be wrong). I don't fully understand why it is said to be MLP-free in "The resulting architecture is entirely attention- and MLP-free."

@DanFu09
Copy link
Collaborator

DanFu09 commented Oct 26, 2023

Thanks for your questions!

it includes the ReLU in the sequence mixing layer

This is a typo - we used to say that the sequence mixing had an "optional" activation function that we would set to identity for the sequence mixer. We updated the equation but not the pseudocode -- will fix it the next time we update the arXiv!

equation (3) seems to be a MLP operation

Ah, this is helpful feedback! The distinction we intend to say is that an MLP is quadratic in $d$, and the M2 version is sub-quadratic.

Specifically -- in an MLP, there are linear layers that take quadratic compute ($O(d^2)$ for dimension $d$). In M2, we replace these linear layers with Monarch matrices, which can be computed in sub-quadratic time. So it has the similar structure of an MLP, but is sub-quadratic in $d$.

We'll clarify the language in the next arXiv update, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants