Skip to content

Feature Request: Built-in Token Probability Output for Inference API #14611

@officiallyutso

Description

@officiallyutso

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

It would be highly beneficial if llama.cpp supported an option in its inference APIs (e.g., llama_tokenize, llama_eval, etc.) to return the log probability or probability distribution of each token generated during inference.

This feature is particularly useful for tasks such as:

Confidence estimation in generated outputs

Building applications involving uncertainty modeling

Language model calibration studies

Advanced prompting workflows that rely on token-level analysis

Motivation

Currently, users need to make workarounds like wrapping logits manually or modifying the source to extract token probabilities, which:

  • Increases maintenance overhead

  • Reduces usability for research and production use-cases

  • Deters new contributors or developers from integrating llama.cpp

Possible Implementation

Add an optional flag or method in the inference API to return per-token log probabilities (similar to logprobs in OpenAI API).

Ensure the output can be toggled to avoid performance penalties when not needed.

Documentation update with example usage.

This feature has been widely adopted in APIs like OpenAI’s and HuggingFace Transformers, and its inclusion would increase llama.cpp’s utility in academic and production settings.

If this sounds like a good fit for the project, I’d be happy to help explore or prototype the implementation!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions