Skip to content

16 min to reply to hi [LicheeRV Nano] #26

@Pescad0res

Description

@Pescad0res

Bug Report: Extremely slow inference on LicheeRV Nano (RISC-V C906)

Device & OS

  • Hardware: LicheeRV Nano (SG2002 SoC, RISC-V C906 1GHz, 256MB DDR3 — 128MB available to Linux)
  • OS: Buildroot (custom minimal Linux)
  • Compiler: gcc-riscv64-linux-gnu (cross-compiled on Kali Linux) with -static flag

Model

  • Model file: tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
  • Quantization: Q4_K_M

What happened?
Inference speed is extremely slow on RISC-V — ~0.0 tok/s instead of the advertised ~1 tok/s. A simple 10-token generation took over 16 minutes. The prefill alone took 162 seconds for just 2 tokens.

Command you ran

/root/.picolm/bin/picolm /root/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf -p "hi" -n 10 -j 2

Expected output
~1 tok/s as listed in the picolm README for embedded/lightweight devices.

Actual output

Loading model: /root/.picolm/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
Model config:
  n_embd=2048, n_ffn=5632, n_heads=32, n_kv_heads=4
  n_layers=22, vocab_size=32000, max_seq=2048
  head_dim=64, rope_base=10000.0
Allocating 1.17 MB for runtime state (+ 44.00 MB FP16 KV cache)
Tokenizer loaded: 32000 tokens, bos=1, eos=2
Prompt: 2 tokens, generating up to 10 (temp=0.80, top_p=0.90, threads=2)
---
ểu như là những ng
---
Prefill: 2 tokens in 162.92s (0.0 tok/s)
Generation: 11 tokens in 815.72s (0.0 tok/s)
Total: 978.64s
Memory: 45.17 MB runtime state (FP16 KV cache)
real    16m 19.55s
user    13m 3.19s
sys     0m 29.07s

Build output
Cross-compiled on Kali Linux for RISC-V:

make CC=riscv64-linux-gnu-gcc CFLAGS="-static" riscv

Additional notes
The binary architecture is confirmed correct:

/root/.picolm/bin/picolm: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, for GNU/Linux 4.15.0

The board is running at 100% CPU during inference. Suspected missing RISC-V vectorization optimizations (RVV) in the build target. The riscv make target may not be enabling optimal compiler flags for the C906 core specifically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions