Feature Request: Built-in Token Probability Output for Inference API

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description




It would be highly beneficial if `llama.cpp` supported an option in its inference APIs (e.g., `llama_tokenize`, `llama_eval`, etc.) to return the log probability or probability distribution of each token generated during inference.

This feature is particularly useful for tasks such as:

Confidence estimation in generated outputs

Building applications involving uncertainty modeling

Language model calibration studies

Advanced prompting workflows that rely on token-level analysis

### Motivation

Currently, users need to make workarounds like wrapping logits manually or modifying the source to extract token probabilities, which:

- Increases maintenance overhead

- Reduces usability for research and production use-cases

- Deters new contributors or developers from integrating llama.cpp

### Possible Implementation

Add an optional flag or method in the inference API to return per-token log probabilities (similar to logprobs in OpenAI API).

Ensure the output can be toggled to avoid performance penalties when not needed.

Documentation update with example usage.


This feature has been widely adopted in APIs like OpenAI’s and HuggingFace Transformers, and its inclusion would increase llama.cpp’s utility in academic and production settings.

If this sounds like a good fit for the project, I’d be happy to help explore or prototype the implementation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Built-in Token Probability Output for Inference API #14611

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Built-in Token Probability Output for Inference API #14611

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions