GitHub - zldscr0/SENT: Official implementation of SENT: Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning.

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

This is the official repository containing the training and evaluation code for SENT.

💥 News

[2025.12.04] Our paper is now accessible at arxiv.

👀 About SENT

We present SENT, a dual-level exploration-aware framework designed to improve large language models’ reasoning capabilities by preventing entropy collapse during training. SENT jointly optimizes data organization and algorithmic regularization, enabling stable and sustained reasoning improvement.

This repository also includes training and evaluation code adapted for Ascend and openPangu embedded models. The corresponding sent-ascend code is in sent-ascend/.

🔥 Quick Start

1️⃣ Set up environment

Please follow the environment setup instructions provided in
VERL Installation Guide.

2️⃣ Download checkpoint

Model	Repo ID
1.5B	deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
7B	Qwen/Qwen2.5-Math-7B
14B	Qwen/Qwen3-14B

Download a model by:

from huggingface_hub import snapshot_download

save_dir = ""
repo_id = ""
cache_dir = ""

snapshot_download(cache_dir=cache_dir,
  local_dir=save_dir,
  repo_id=repo_id,
  local_dir_use_symlinks=False,
  resume_download=True,
  allow_patterns=["*.json", "*.safetensors", "*.bin", "*.py", "*.md", "*.txt"],
)

🔥 Train & Eval

Train

After preparing the environment and downloading the base model, run:

bash SENT/examples/grpo_curriculum/run_low_entropy_kl_cov_v2_curr_stage1.sh
bash SENT/examples/grpo_curriculum/run_low_entropy_kl_cov_v2_curr_stage2.sh

Eval

Evaluation is conducted using the OpenCompass framework with vLLM acceleration.

Benchmark	config_id
AIME2024	aime2024
AIME2025	aime2025
AMC23	amc23
MATH500	math_500_gen
OlympiadBench	olympiadbench
MINERVA	minerva

cd eval
CUDA_VISIBLE_DEVICES=0 python run.py \
  --datasets <config_id> \
  --hf-type chat \
  --hf-path <PATH_TO_MODEL> \
  --dump-eval-details \
  --accelerator vllm

📊 Benchmarks

🤗 Acknowledgement

This repository is built upon the VERL reinforcement learning framework and OpenCompass evaluation platform.
We thank the authors for their valuable contributions to the community.

✍️ Citation

@article{cao2025efficient,
  title={Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning},
  author={Cao, Hongye and Bai, Zhixin and Peng, Ziyue and Wang, Boyan and Yang, Tianpei and Huo, Jing and Zhang, Yuyao and Gao, Yang},
  journal={arXiv preprint arXiv:2512.04359},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
docker		docker
docs		docs
eval		eval
examples		examples
recipe		recipe
rllm		rllm
scripts		scripts
sent-ascend		sent-ascend
tests		tests
verl		verl
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

💥 News

👀 About SENT

🔥 Quick Start

🔥 Train & Eval

Train

Eval

📊 Benchmarks

🤗 Acknowledgement

✍️ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

💥 News

👀 About SENT

🔥 Quick Start

🔥 Train & Eval

Train

Eval

📊 Benchmarks

🤗 Acknowledgement

✍️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages