Skip to content

v0.1

Latest

Choose a tag to compare

@mbz4 mbz4 released this 21 Apr 13:17
· 2 commits to main since this release

First tagged release accompanying the paper
"Benchmarking Local Language Models for Social Robots using Edge Devices"
(accepted IEEE ARSO 2026).

Release summary. Reproducible benchmark suite covering 25 open-source
language models on Raspberry Pi 4, Raspberry Pi 5, and laptop-GPU hosts.
Evaluates inference efficiency (TPS, TPJ), knowledge (six-category MMLU
subset), and teaching effectiveness (LLM-rated against eight criteria,
validated by five human raters).

Accompanying data record: https://doi.org/10.5281/zenodo.19643021

Highlights since dorian-original:

  • Consolidated per-platform runners and analysers from the development
    repository (orlandossss/Master_Benchmark, archiving).
  • Disk-I/O telemetry on the Raspberry Pi runners, matching the data
    published in the Zenodo record.
  • Linux-only packaging with pinned requirements.txt and setup.sh.
  • Syntax-check CI workflow on push and pull request.
  • Apache-2.0 licence, CITATION.cff, hardened .gitignore.

Known scope: the three benchmark runners and three analysers remain
separate per-platform scripts for v0.1. Consolidation into a single
platform-aware runner is scoped for v0.2 — see future_work/ for the
broader forward-looking roadmap.

Full Changelog: dorian-original...v0.1