-
Notifications
You must be signed in to change notification settings - Fork 995
Open
Description
I am wondering why puzzle_emb_len is set to 16. HRM uses a value of 1 as far as I can figure out. I see that positions 1 through puzzle_emb_len-1 don't participate in lm_head or q_head computations. However, they participate in the z_L and z_H computations through the attention model. I am trying to understand the reason for this design.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels