Rollback support for speculative decoding? #117

benchislett · 2025-02-07T19:21:17Z

Does llguidance support a state rollback primitive for use in draft-model speculative decoding (where some tokens need to be generated subject to guidance, and then only some of those tokens are accepted for continued generation)?

As of now, the only structured output backend in vLLM which supports this feature is xGrammar. I am curious if this exists in llguidance, or if it is on the roadmap / compatible with the design.

Thanks to all maintainers for a great contribution to the open-source community.

mmoskal · 2025-02-07T23:06:08Z

Rollback is currently not implemented, but it wouldn't be super-hard to add.

However there are two other APIs that relevant:

you can clone the whole constraint; you can do either sharing or not sharing lexer state (the lexer states is protected by a mutex, so if you're sharing it the grammars cannot compute masks in parallel)
you can validate a number of tokens (validate_tokens_raw() method) in the current context, without modifying the state of the constraint - this is quite cheap

Another API we may want to add is compute_mask_after_tokens() that would save constraint state, consume a number of tokens, compute mask, and restore state (this would be easier than a general rollback).

In some situations it won't be possible to compute masks for all draft tokens, so one would have to do rejection sampling in that case. Note that rejection sampling is not equivalent to mask-and-sample for top_p/k (but is equivalent for temperature and argmax).

Let me know if any of these help!

mmoskal · 2025-02-21T00:21:12Z

Actually, let me keep this open until Python interface is available. Right now, Python uses Constraint which wraps TokenParser, which may not be the best way forward.

mmoskal mentioned this issue Feb 21, 2025

support for rollback() #126

Merged

mmoskal closed this as completed in #126 Feb 21, 2025

mmoskal reopened this Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollback support for speculative decoding? #117

Rollback support for speculative decoding? #117

benchislett commented Feb 7, 2025

mmoskal commented Feb 7, 2025

mmoskal commented Feb 21, 2025

Rollback support for speculative decoding? #117

Rollback support for speculative decoding? #117

Comments

benchislett commented Feb 7, 2025

mmoskal commented Feb 7, 2025

mmoskal commented Feb 21, 2025