Is it correct to keep using adapter_kv_cache during training in litgpt/adapter.py? I think self.adapter_wte and self.attn are updated during training, so ak and av should not use kv_cache. Thank you very much!
But it seems that during training, self.adapter_kv_cache was also used.
