TinyLlama-1.1B-Chat-v1.0-RK3588-1.1.4

This is TinyLlama-1.1B-Chat-v1.0, a lightweight chat model optimized to run on the RK3588 NPU with w8a8 quantization. The model is tailored for efficient inference and high performance on edge devices, leveraging RKLLM (version 1.1.4).

Key Features

Optimized for RK3588 NPU using w8a8 quantization.
Compatible with RKLLM version 1.1.4.
Converted using the ez-er-rkllm-toolkit.

Included Datasets

SlimPajama-627B (Cerebras)
Starcoder Data (BigCode)
Ultrachat_200k (HuggingFaceH4)
Ultrafeedback_binarized (HuggingFaceH4)

License

This model is released under the Apache-2.0 license.

Getting Started with RKLLAMA

Follow these steps to use TinyLlama-1.1B-Chat-v1.0 with RKLLAMA:

1. Clone the RKLLAMA Repository

git clone https://github.com/notpunchnox/rkllama
cd rkllama

2. Install Dependencies

Run the setup script to install all required dependencies:

chmod +x setup.sh
sudo ./setup.sh

3. Add the Model

Download the model and place it in the models/ directory:

cd ~/RKLLAMA/models/
curl -L -O https://huggingface.co/punchnox/TinyLlama-1.1B-Chat-v1.0-rk3588-1.1.4/blob/main/TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm

4. Launch the RKLLAMA Server

Start the server to enable model usage:

rkllama serve

5. Interact with the Model

List Available Models

To view all models installed in RKLLAMA:

rkllama list

Run the Model

Load the model on the RK3588 NPU:

rkllama run TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm

Base model: TinyLlama-1.1B-Chat-v1.0

punchnox
/

Tinnyllama-1.1B-rk3588-rkllm-1.1.4