Prerequisites
Feature Description
I've been reading about some success regarding Eagle-3 for speculative decoding. Model architectures like LlamaForCausalLMEagle3 are not supported for converting to GGUF to be used as draft models.
This ask is for llama.cpp to support converting these model architectures to GGUF and have them be able to be used as draft models.
Motivation
https://huggingface.co/baseten-admin/EAGLE3-gpt-oss-120b-bf16 has an EAGLE-3 model for GPT-OSS (seems to be 885M parameters to gpt-oss-120b's ~5B) that could be used a draft model. GPT-OSS has demonstrated itself to be fairly capable for agentic use-cases with recent updates, but the prompt processing / inference speeds for localized compute are just shy of fast.
Possible Implementation
No response
Prerequisites
Feature Description
I've been reading about some success regarding Eagle-3 for speculative decoding. Model architectures like
LlamaForCausalLMEagle3are not supported for converting to GGUF to be used as draft models.This ask is for llama.cpp to support converting these model architectures to GGUF and have them be able to be used as draft models.
Motivation
https://huggingface.co/baseten-admin/EAGLE3-gpt-oss-120b-bf16 has an EAGLE-3 model for GPT-OSS (seems to be 885M parameters to gpt-oss-120b's ~5B) that could be used a draft model. GPT-OSS has demonstrated itself to be fairly capable for agentic use-cases with recent updates, but the prompt processing / inference speeds for localized compute are just shy of fast.
Possible Implementation
No response