Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] LLM Token-Level Generation Supervision #370

Open
iwr-redmond opened this issue Feb 4, 2025 · 0 comments
Open

[FEATURE] LLM Token-Level Generation Supervision #370

iwr-redmond opened this issue Feb 4, 2025 · 0 comments
Labels
💡 feature request New feature or request

Comments

@iwr-redmond
Copy link

iwr-redmond commented Feb 4, 2025

Feature Description

Rescued from #368:

You may wish to consider implementing one of the token-level supervision options for LlamaCPP to deliver superior adherence during structured generation. It's the difference between asking "pretty please" and guaranteeing a correctly structured response.

As currently implemented by @xsxszab in nexa_inference_text.py, generation will fail if the model does not return a valid JSON response or doesn't follow the requested schema.

Options

LM Format Enforcer (Python)

LM Format Enforcer's llama-cpp-python integration code should be easy to adapt. This package is already being used in RedHat/IBM's enterprise-focused VLLM project (reference).

A demonstration workbook is available here. You may be able to run this workbook as-is by merely changing the imports. e.g.:

-from llama_cpp import LogitsProcessorList
+from nexa.gguf.llama import LogitsProcessorList

LLGuidance (upstream)

The LLGuidance Rust crate has recently been added to upstream llama.cpp.

Enabling this feature during compilation requires some fiddling with Rust, and there are still some bug fixes that need to be finalized (pull 11644). However, these are transitional problems and adopting this approach would probably make it easier for end-users to utilize structured generation using the SDK.

@iwr-redmond iwr-redmond added the 💡 feature request New feature or request label Feb 4, 2025
@iwr-redmond iwr-redmond changed the title [FEATURE] LM Format Enforcer integration [FEATURE] LLM Token-Level Generation Supervision Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant