KV-cache for T5 model #1358

YK-Fu · 2025-01-17T03:03:25Z

The current Megatron-core only implements the KV-cache mechanism for decoder-only models. I have implemented the KV-cache for seq2seq (T5) model to integrate with NeMo (NVIDIA/NeMo#11881). The main changes include setting the KV-cache input for cross_attention to None, as it doesn't require a KV-cache. Additionally, I added the KV-cache offset to the T5 model's position ID in each forward step.

T5 kv-cache

9e8d267

YK-Fu mentioned this pull request Jan 17, 2025

KV-cache for T5 model NVIDIA/NeMo#11881

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV-cache for T5 model #1358

KV-cache for T5 model #1358

YK-Fu commented Jan 17, 2025 •

edited

Loading

KV-cache for T5 model #1358

Are you sure you want to change the base?

KV-cache for T5 model #1358

Conversation

YK-Fu commented Jan 17, 2025 • edited Loading

YK-Fu commented Jan 17, 2025 •

edited

Loading