Skip to content

Feature Request: Support EAGLE3 models for draft model / speculative decoding use cases #15305

@roykim98

Description

@roykim98

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

I've been reading about some success regarding Eagle-3 for speculative decoding. Model architectures like LlamaForCausalLMEagle3 are not supported for converting to GGUF to be used as draft models.

This ask is for llama.cpp to support converting these model architectures to GGUF and have them be able to be used as draft models.

Motivation

https://huggingface.co/baseten-admin/EAGLE3-gpt-oss-120b-bf16 has an EAGLE-3 model for GPT-OSS (seems to be 885M parameters to gpt-oss-120b's ~5B) that could be used a draft model. GPT-OSS has demonstrated itself to be fairly capable for agentic use-cases with recent updates, but the prompt processing / inference speeds for localized compute are just shy of fast.

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions