You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You may wish to consider implementing one of the token-level supervision options for LlamaCPP to deliver superior adherence during structured generation. It's the difference between asking "pretty please" and guaranteeing a correctly structured response.
As currently implemented by @xsxszab in nexa_inference_text.py, generation will fail if the model does not return a valid JSON response or doesn't follow the requested schema.
Options
LM Format Enforcer (Python)
LM Format Enforcer's llama-cpp-python integration code should be easy to adapt. This package is already being used in RedHat/IBM's enterprise-focused VLLM project (reference).
A demonstration workbook is available here. You may be able to run this workbook as-is by merely changing the imports. e.g.:
The LLGuidance Rust crate has recently been added to upstream llama.cpp.
Enabling this feature during compilation requires some fiddling with Rust, and there are still some bug fixes that need to be finalized (pull 11644). However, these are transitional problems and adopting this approach would probably make it easier for end-users to utilize structured generation using the SDK.
The text was updated successfully, but these errors were encountered:
Feature Description
Rescued from #368:
You may wish to consider implementing one of the token-level supervision options for LlamaCPP to deliver superior adherence during structured generation. It's the difference between asking "pretty please" and guaranteeing a correctly structured response.
As currently implemented by @xsxszab in nexa_inference_text.py, generation will fail if the model does not return a valid JSON response or doesn't follow the requested schema.
Options
LM Format Enforcer (Python)
LM Format Enforcer's llama-cpp-python integration code should be easy to adapt. This package is already being used in RedHat/IBM's enterprise-focused VLLM project (reference).
A demonstration workbook is available here. You may be able to run this workbook as-is by merely changing the imports. e.g.:
LLGuidance (upstream)
The LLGuidance Rust crate has recently been added to upstream llama.cpp.
Enabling this feature during compilation requires some fiddling with Rust, and there are still some bug fixes that need to be finalized (pull 11644). However, these are transitional problems and adopting this approach would probably make it easier for end-users to utilize structured generation using the SDK.
The text was updated successfully, but these errors were encountered: