Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about the autoregressive model vocab size #41

Closed
sysuyy opened this issue Nov 3, 2024 · 2 comments
Closed

questions about the autoregressive model vocab size #41

sysuyy opened this issue Nov 3, 2024 · 2 comments

Comments

@sysuyy
Copy link

sysuyy commented Nov 3, 2024

image Hello ! Thank you for your wonderful work. Why the vocab size in config set to 512 , but not 262,144?
@RobertLuo1
Copy link
Collaborator

Hi, thanks for your interest in our work. Indeed, we train a codebook of 262144 codes. However, as stated in the paper, we observe that optimizing such a large codebook in relatively small amount of data (Imagenet) is difficult. To help the model predict with large vocabulary, we adopt the asymmetric token factorization technique. Actually the codebook size with the factorization method is 64 and 4096 respectively. The vocab_size here is for when not adopting the asymmetric method.

@sysuyy
Copy link
Author

sysuyy commented Nov 3, 2024

Thank you for your detailed reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants