In the needle-in-a-haystack section of your paper, you mentioned:
"However, linearizing with passkey samples (LoLCATs Llama 3 8B (Passkey)) recovers 100% accuracy."
Does this step involving lora-finetuning with passkey samples? Or only Attention-Transfer with passkey samples?
In the needle-in-a-haystack section of your paper, you mentioned:
"However, linearizing with passkey samples (LoLCATs Llama 3 8B (Passkey)) recovers 100% accuracy."
Does this step involving lora-finetuning with passkey samples? Or only Attention-Transfer with passkey samples?