TinyLlama-1.1B-Chat-v1.0-RK3588-1.1.4
This is TinyLlama-1.1B-Chat-v1.0, a lightweight chat model optimized to run on the RK3588 NPU with w8a8 quantization. The model is tailored for efficient inference and high performance on edge devices, leveraging RKLLM (version 1.1.4).
Key Features
- Optimized for RK3588 NPU using w8a8 quantization.
- Compatible with RKLLM version 1.1.4.
- Converted using the ez-er-rkllm-toolkit.
Included Datasets
- SlimPajama-627B (Cerebras)
- Starcoder Data (BigCode)
- Ultrachat_200k (HuggingFaceH4)
- Ultrafeedback_binarized (HuggingFaceH4)
License
This model is released under the Apache-2.0 license.
Getting Started with RKLLAMA
Follow these steps to use TinyLlama-1.1B-Chat-v1.0 with RKLLAMA:
1. Clone the RKLLAMA Repository
git clone https://github.com/notpunchnox/rkllama
cd rkllama
2. Install Dependencies
Run the setup script to install all required dependencies:
chmod +x setup.sh
sudo ./setup.sh
3. Add the Model
Download the model and place it in the models/
directory:
cd ~/RKLLAMA/models/
curl -L -O https://huggingface.co/punchnox/TinyLlama-1.1B-Chat-v1.0-rk3588-1.1.4/blob/main/TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm
4. Launch the RKLLAMA Server
Start the server to enable model usage:
rkllama serve
5. Interact with the Model
List Available Models
To view all models installed in RKLLAMA:
rkllama list
Run the Model
Load the model on the RK3588 NPU:
rkllama run TinyLlama-1.1B-Chat-v1.0-rk3588-w8a8-opt-0-hybrid-ratio-0.5.rkllm
Base model: TinyLlama-1.1B-Chat-v1.0
- Downloads last month
- 2