Uncertainty-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models.

This repository contains the code and processed data for the Uncertainty-LINE method. Experimental managers are available for download here.

Repository Structure

├── results/                            # Collected experimental results in CSV format
├── plots/                              # Generated plots for analysis
│
├── 01_plotting_quality_ue_trends.ipynb # Notebook: analyze quality and uncertainty trends vs. generation length
├── 02_collect_experimental_results.ipynb # Notebook: aggregate and organize experimental results
├── 03_main_tables.ipynb                # Notebook: generate the main tables used in the paper
│
├── utils.py                            # Utility functions for data loading, detrending, and processing
├── enrich_metrics.py                   # Script to enrich generations with alternative quality metrics
│
├── README.md                           # Project documentation (this file)
└── requirements.txt                    # Python dependencies

Downloaded managers should be placed in the folder processed_mans/

Usage

Install Dependencies
```
pip install -r requirements.txt  
```

Run lm-polygraph - collect data for training and test.

Test split:

HYDRA_CONFIG=`pwd`/examples/configs/polygraph_eval_wmt14_csen.yaml \
  polygraph_eval \
  batch_size=1 \
  cache_path=/path/to/cache \
  model=gemma \
  subsample_eval_dataset=2000 \
  deberta_batch_size=1 \
  +deberta_device=cuda:0 \
  model.load_model_args.device_map=auto

Train split:

HYDRA_CONFIG=`pwd`/examples/configs/polygraph_eval_wmt14_csen.yaml \
   polygraph_eval \
   batch_size=1 \
   cache_path=/path/to/cache/train \
   model=gemma \
   subsample_eval_dataset=2000 \
   deberta_batch_size=1 \
   eval_split=train \
   +deberta_device=cuda:0 \
   model.load_model_args.device_map=auto

Citation

If you use this repository, please cite:

@misc{vashurin2025uncertaintylinelengthinvariantestimationuncertainty,
      title={UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models}, 
      author={Roman Vashurin and Maiya Goloburda and Preslav Nakov and Maxim Panov},
      year={2025},
      eprint={2505.19060},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.19060}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Uncertainty-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models.

Repository Structure

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
plots		plots
results		results
01_plotting_quality_ue_trends.ipynb		01_plotting_quality_ue_trends.ipynb
02_collect_experimental_results.ipynb		02_collect_experimental_results.ipynb
03_main_tables.ipynb		03_main_tables.ipynb
README.md		README.md
enrich_metrics.py		enrich_metrics.py
requirements.txt		requirements.txt
utils.py		utils.py

Uh oh!

Uh oh!

stat-ml/uncertainty-line

Folders and files

Latest commit

History

Repository files navigation

Uncertainty-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models.

Repository Structure

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages