GitHub - justADeni/intel-npu-llm: A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)

Features 🌟

Default models list:
- meta-llama/Meta-Llama-3.1-8B-Instruct
- microsoft/Phi-3-mini-4k-instruct
- Qwen/Qwen2-7B
- mistralai/Mistral-7B-Instruct-v0.2
- openbmb/MiniCPM-1B-sft-bf16
- TinyLlama/TinyLlama-1.1B-Chat-v1.0
User can input any model they like
- No guarantee that every model will compile for the NPU, though
- here is a list of models likely to run on NPU
One-Time Setup: The script downloads the model, quantizes it, converts it to OpenVINO IR format, compiles it for the NPU, and caches the result for future use. 💡⌛
Performance: Surprisingly fast inference speeds, even on devices with modest computational power (e.g., my Meteor Lake's 13 TOPS NPU). ⚡⏳
Power Efficiency: While inference might be faster on a CPU or GPU for some devices, the NPU is significantly more energy-efficient, making it ideal for laptops. 🔋🌐

Screenshot

As you can see, It's using NPU for text generation.

Requirements ✅

Python 3.9 to 3.12
An Intel processor with an NPU:
- Meteor Lake (Core Ultra Series 1, i.e., 1XX chips)
- Arrow Lake (Core Ultra Series 2, i.e., 2XX chips)
- Lunar Lake (Core Ultra Series 2, i.e., 2XX chips)
Newest Intel NPU driver
- Windows
- Linux

Installation 🌐

Step 1: Clone the Repository 🔗

git clone https://github.com/justADeni/intel-npu-llm.git
cd intel-npu-llm

Step 2: Create a Virtual Environment 🔢

python -m venv npu_venv

Step 3: Activate the Virtual Environment ⚛️

On Windows:
```
npu_venv/Scripts/activate
```
On Linux:
```
source npu_venv/bin/activate
```

Step 4: Install Dependencies 📁✔️

pip install -r requirements.txt

Step 5: Run the Script 🔄⚡

python intel_npu_llm.py

Notes ℹ️

Resource-Intensive Compilation: The quantization and compilation steps can be time-consuming, taking up to tens of minutes depending on your hardware. However, these steps are performed only once per model and are cached for future use. ⌛⚙️
But wait, why does context fill up and then reset?: Continuous batching has not yet been implemented for NPU's by Intel's OpenVINO engineers. You can check API coverage % here. 🚧🛠️

Contributing ⭐

Contributions, bug reports, and feature requests are welcome! Feel free to open an issue or submit a pull request. 🔨✍️

License 🔓

This project is licensed under the MIT License. 🔒✨

Enjoy using intel-npu-llm ! For any questions or feedback, please reach out or open an issue on GitHub. ✨🔧

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
resources		resources
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
intel_npu_llm.py		intel_npu_llm.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features 🌟

Requirements ✅

Installation 🌐

Step 1: Clone the Repository 🔗

Step 2: Create a Virtual Environment 🔢

Step 3: Activate the Virtual Environment ⚛️

Step 4: Install Dependencies 📁✔️

Step 5: Run the Script 🔄⚡

Notes ℹ️

Contributing ⭐

License 🔓

About

Releases

Packages

Languages

License

justADeni/intel-npu-llm

Folders and files

Latest commit

History

Repository files navigation

Features 🌟

Requirements ✅

Installation 🌐

Step 1: Clone the Repository 🔗

Step 2: Create a Virtual Environment 🔢

Step 3: Activate the Virtual Environment ⚛️

Step 4: Install Dependencies 📁✔️

Step 5: Run the Script 🔄⚡

Notes ℹ️

Contributing ⭐

License 🔓

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages