Curious About Subspace Size #35

Lqf-HFNJU · 2024-09-10T07:32:30Z

Hi,

I'm curious about the choice of subspace sizes mentioned in the paper, which are set to 64(2^6) and 4096(2^12). What was the reasoning behind this specific configuration? Why not use two subspaces of the same size, such as both being 512(2^9)?

Thank you for your insights!

ShiFengyuan1999 · 2024-09-10T10:36:48Z

Hi @Lqf-HFNJU. With two different subspace sizes (first 2^6 and second 2^12), we can first make a coarse classification and narrow down the search space, then make a precise classification in the reduced space. Moreover, this asymmetric token factorization introduces more learnable embeddings (64+4096 vs. 512+512), which increases the model capacity.

Lqf-HFNJU · 2024-09-11T08:01:57Z

Thanks!

Lqf-HFNJU closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Curious About Subspace Size #35

Curious About Subspace Size #35

Lqf-HFNJU commented Sep 10, 2024

ShiFengyuan1999 commented Sep 10, 2024

Lqf-HFNJU commented Sep 11, 2024

Curious About Subspace Size #35

Curious About Subspace Size #35

Comments

Lqf-HFNJU commented Sep 10, 2024

ShiFengyuan1999 commented Sep 10, 2024

Lqf-HFNJU commented Sep 11, 2024