[FEATURE] LLM Token-Level Generation Supervision #370

iwr-redmond · 2025-02-04T15:48:23Z

Feature Description

Rescued from #368:

You may wish to consider implementing one of the token-level supervision options for LlamaCPP to deliver superior adherence during structured generation. It's the difference between asking "pretty please" and guaranteeing a correctly structured response.

As currently implemented by @xsxszab in nexa_inference_text.py, generation will fail if the model does not return a valid JSON response or doesn't follow the requested schema.

Options

LM Format Enforcer (Python)

LM Format Enforcer's llama-cpp-python integration code should be easy to adapt. This package is already being used in RedHat/IBM's enterprise-focused VLLM project (reference).

A demonstration workbook is available here. You may be able to run this workbook as-is by merely changing the imports. e.g.:

-from llama_cpp import LogitsProcessorList
+from nexa.gguf.llama import LogitsProcessorList

LLGuidance (upstream)

The LLGuidance Rust crate has recently been added to upstream llama.cpp.

Enabling this feature during compilation requires some fiddling with Rust, and there are still some bug fixes that need to be finalized (pull 11644). However, these are transitional problems and adopting this approach would probably make it easier for end-users to utilize structured generation using the SDK.

iwr-redmond added the 💡 feature request New feature or request label Feb 4, 2025

iwr-redmond changed the title ~~[FEATURE] LM Format Enforcer integration~~ [FEATURE] LLM Token-Level Generation Supervision Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] LLM Token-Level Generation Supervision #370

[FEATURE] LLM Token-Level Generation Supervision #370

iwr-redmond commented Feb 4, 2025 •

edited

Loading

[FEATURE] LLM Token-Level Generation Supervision #370

[FEATURE] LLM Token-Level Generation Supervision #370

Comments

iwr-redmond commented Feb 4, 2025 • edited Loading

Feature Description

Options

iwr-redmond commented Feb 4, 2025 •

edited

Loading