The admin UI (_modal_model_settings.html) and model_settings.py docstrings describe SpecPrefill as "for MoE/hybrid models", but the implementation in patches/specprefill.py has zero MoE-specific logic . It uses architecture-agnostic query extractors including _llama_extract_queries for standard dense transformers, and no MoE checks exist in the scheduler or engine.
This is misleading for users with dense models who may skip the feature based on the label.
Fix: PR #1044 removes the "MoE" qualifier from all 3 locations.
The admin UI (
_modal_model_settings.html) andmodel_settings.pydocstrings describe SpecPrefill as "for MoE/hybrid models", but the implementation inpatches/specprefill.pyhas zero MoE-specific logic . It uses architecture-agnostic query extractors including_llama_extract_queriesfor standard dense transformers, and no MoE checks exist in the scheduler or engine.This is misleading for users with dense models who may skip the feature based on the label.
Fix: PR #1044 removes the "MoE" qualifier from all 3 locations.