Releases · RobotStudyCompanion/Benchmark_LM

First tagged release accompanying the paper
"Benchmarking Local Language Models for Social Robots using Edge Devices"
(accepted IEEE ARSO 2026).

Release summary. Reproducible benchmark suite covering 25 open-source
language models on Raspberry Pi 4, Raspberry Pi 5, and laptop-GPU hosts.
Evaluates inference efficiency (TPS, TPJ), knowledge (six-category MMLU
subset), and teaching effectiveness (LLM-rated against eight criteria,
validated by five human raters).

Accompanying data record: https://doi.org/10.5281/zenodo.19643021

Highlights since dorian-original:

Consolidated per-platform runners and analysers from the development
repository (orlandossss/Master_Benchmark, archiving).
Disk-I/O telemetry on the Raspberry Pi runners, matching the data
published in the Zenodo record.
Linux-only packaging with pinned requirements.txt and setup.sh.
Syntax-check CI workflow on push and pull request.
Apache-2.0 licence, CITATION.cff, hardened .gitignore.

Known scope: the three benchmark runners and three analysers remain
separate per-platform scripts for v0.1. Consolidation into a single
platform-aware runner is scoped for v0.2 — see future_work/ for the
broader forward-looking roadmap.

Full Changelog: dorian-original...v0.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: RobotStudyCompanion/Benchmark_LM

v0.1

Uh oh!

Dorian's original Benchmarking_LLM suite

Uh oh!