Releases: RobotStudyCompanion/Benchmark_LM
v0.1
First tagged release accompanying the paper
"Benchmarking Local Language Models for Social Robots using Edge Devices"
(accepted IEEE ARSO 2026).
Release summary. Reproducible benchmark suite covering 25 open-source
language models on Raspberry Pi 4, Raspberry Pi 5, and laptop-GPU hosts.
Evaluates inference efficiency (TPS, TPJ), knowledge (six-category MMLU
subset), and teaching effectiveness (LLM-rated against eight criteria,
validated by five human raters).
Accompanying data record: https://doi.org/10.5281/zenodo.19643021
Highlights since dorian-original:
- Consolidated per-platform runners and analysers from the development
repository (orlandossss/Master_Benchmark, archiving). - Disk-I/O telemetry on the Raspberry Pi runners, matching the data
published in the Zenodo record. - Linux-only packaging with pinned
requirements.txtandsetup.sh. - Syntax-check CI workflow on push and pull request.
- Apache-2.0 licence, CITATION.cff, hardened
.gitignore.
Known scope: the three benchmark runners and three analysers remain
separate per-platform scripts for v0.1. Consolidation into a single
platform-aware runner is scoped for v0.2 — see future_work/ for the
broader forward-looking roadmap.
Full Changelog: dorian-original...v0.1