CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs Paper • 2409.12490 • Published Sep 19, 2024 • 2
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management Paper • 2406.19707 • Published Jun 28, 2024 • 2
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding Paper • 2409.08561 • Published Sep 13, 2024 • 2
Diver: Large Language Model Decoding with Span-Level Mutual Information Verification Paper • 2406.02120 • Published Jun 4, 2024 • 2
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models Paper • 2405.07542 • Published May 13, 2024 • 2
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation Paper • 2407.11798 • Published Jul 16, 2024 • 1 • 2
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling Paper • 2408.08696 • Published Aug 16, 2024 • 2
Learning Harmonized Representations for Speculative Sampling Paper • 2408.15766 • Published Aug 28, 2024 • 2
Parallel Speculative Decoding with Adaptive Draft Length Paper • 2408.11850 • Published Aug 13, 2024 • 2
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning Paper • 2408.08146 • Published Aug 15, 2024 • 2
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration Paper • 2404.12022 • Published Apr 18, 2024 • 2
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy Paper • 2404.06954 • Published Apr 10, 2024 • 2
GrootVL: Tree Topology is All You Need in State Space Model Paper • 2406.02395 • Published Jun 4, 2024 • 2
MoDeGPT: Modular Decomposition for Large Language Model Compression Paper • 2408.09632 • Published Aug 19, 2024 • 2
InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference Paper • 2409.04992 • Published Sep 8, 2024 • 2 • 2
Palu: Compressing KV-Cache with Low-Rank Projection Paper • 2407.21118 • Published Jul 30, 2024 • 1 • 2