🚀 Intel Arc AI Server (Local Inference)

A high-performance, local AI inference server running directly on Windows 11 using Intel Arc graphics.

This project sets up OpenVINO Model Server (OVMS) to serve a wide range of efficient INT4 quantized LLMs (including Qwen2.5-Coder, Llama, and more) with full XMX hardware acceleration. It provides an OpenAI-compatible API for use with coding agents, IDEs (PhpStorm, VS Code), and custom scripts.

👨‍💻 Developer Note This project was born out of a need to bridge the gap between high-performance local hardware (Intel Arc) and the latest AI capabilities. I hope this interface empowers your local AI journey.

Let's connect! You can find me on LinkedIn.

⚡ Quick Start

1. New Installation

Run the unified installer to set up everything automatically (Python, Venv, Model, Server):

.\install_all.ps1

(This script is idempotent — safe to run on existing installs to verify components)

2. Start the Server (Standard)

For general use (Open Interpreter, Custom Scripts):

.\run_server.ps1

Port: 8000
Output: Raw high-performance OVMS stream across localhost.

3. Start Dynamic Mode (Hot-Swap Ready)

If you want command-based model switching without restarting scripts:

.\start_server_dynamic.ps1

Uses config.json (--config_path) for OVMS model configuration.
Works with .\manage_models.ps1 switch ....

4. Start for IDEs (PhpStorm / VS Code)

If your tool requires strict OpenAI compliance (e.g., specific id fields in streams):

.\run_ide_proxy.ps1

Port: 8001
Output: Proxied stream with compatibility fixes injected.

5. Configuration (Optional)

Customize ports, paths, and model settings by editing config.env (created after first run or manually):

OVMS_PORT=8000           # REST API Port
OVMS_GRPC_PORT=9000      # gRPC Port
PROXY_PORT=8001          # IDE Proxy Port
MODEL_NAME="qwen2.5-coder-7b"
# ... and more

Note: If you change config.env, re-run .\install_all.ps1 to apply the new settings to the generated launch scripts.

6. Changing Models

To switch models easily (e.g., Llama-3, Mistral, Phi-3), use the interactive setup:

https://huggingface.co/collections/OpenVINO/llm

.\download_model.ps1 -Setup -PerformanceProfile Balanced

═══════════════════════════════════════════════════════════
  Select Model to Install (INT4 Optimized)
═══════════════════════════════════════════════════════════

[1] Qwen2.5-Coder-7B-Instruct | ~5 GB    | Best for Coding (Default)
[2] Qwen3-8B                  | ~5 GB    | Latest Qwen3, Strong All-Round
[3] Qwen3-4B                  | ~3 GB    | Fast Qwen3, Great Quality/Speed
[4] Qwen3-14B                 | ~8 GB    | Largest Qwen3, Best Quality
[5] DeepSeek-R1-Distill-7B    | ~5 GB    | DeepSeek R1 Reasoning
[6] DeepSeek-R1-Distill-14B   | ~8 GB    | DeepSeek R1 Reasoning, Larger
[7] Phi-4-mini-instruct       | ~5 GB    | Microsoft Phi-4 Mini
[8] Phi-4                     | ~8 GB    | Microsoft Phi-4 Full (14B)
[9] Mixtral-8x7B-Instruct     | ~24 GB   | MoE, Needs CPU offload
[10] Qwen3-1.7B                | ~1.2 GB  | Ultra-Fast Debug & Testing

This will display a list of Top 10 verified INT4 models, handle the download, and update your configuration automatically. After changing models:

Standard mode: run .\start_server.ps1
Dynamic mode: run .\manage_models.ps1 switch <ModelName>

7. Performance Profiles (graph.pbtxt)

download_model.ps1 now supports explicit profiles:

.\download_model.ps1 -Setup -PerformanceProfile Safe
.\download_model.ps1 -Setup -PerformanceProfile Balanced
.\download_model.ps1 -Setup -PerformanceProfile Fast

Safe => cache_size=2, max_num_seqs=2
Balanced => cache_size=4, max_num_seqs=4 (default)
Fast => cache_size=8, max_num_seqs=8 (may OOM on larger models)

8. Command-Based Model Control

.\manage_models.ps1 status
.\manage_models.ps1 list
.\manage_models.ps1 switch Qwen3-4B
.\manage_models.ps1 rollback

📚 Documentation

File	Description
INSTALL.md	Detailed manual installation guide and architecture overview.
CONNECT_TOOLS.md	How to connect VS Code (Cline/Continue), Open Interpreter, and Python scripts.
gpu_checklist.md	Verification steps for Intel Arc drivers, ReBAR, and XMX usage.
oom_troubleshooting.md	Solving "Out of Memory" errors and WDDM spilling issues.

🧩 Scripts Overview

install_all.ps1: The "One Script to Rule Them All". Checks prerequisites (OS, GPU) and installs the stack.
run_server.ps1: User-friendly launcher for the main server.
start_server_dynamic.ps1: Dynamic OVMS launcher (--config_path) for hot-swapping.
run_ide_proxy.ps1: Launcher for the compatibility proxy.
manage_models.ps1: Command-based model switch/status/rollback.
verify_environment.ps1: Deep diagnostic tool for debugging environment issues.
setup_ovms.ps1: Helper to download/refresh the OVMS binary.
download_model.ps1: Helper to download INT4 models and generate graph.pbtxt with profile-based limits.

⚙️ Technical Specs

Model: Qwen2.5-Coder-7B-Instruct (INT4 OpenVINO IR)
VRAM Usage: ~5.5 - 7.5 GB (Fits comfortably in 8GB w/ u8 cache precision)
Context Window: 32k (Limited to ~4k-8k purely in VRAM)
Backend: OpenVINO 2025.4 + MediaPipe LLM Calculator
Acceleration: Intel XMX (Matrix Engines) via Level Zero

Verified on Windows 11 Build 26200 + Intel Driver 32.0.101.8425

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
images		images
tools/model_manager		tools/model_manager
.gitignore		.gitignore
CONNECT_TOOLS.md		CONNECT_TOOLS.md
INSTALL.md		INSTALL.md
Load-Config.ps1		Load-Config.ps1
README.md		README.md
SUPPORT.md		SUPPORT.md
config.env.example		config.env.example
docker-compose.yml		docker-compose.yml
download_model.ps1		download_model.ps1
gpu_checklist.md		gpu_checklist.md
install_all.ps1		install_all.ps1
manage_models.ps1		manage_models.ps1
oom_troubleshooting.md		oom_troubleshooting.md
proxy_server.py		proxy_server.py
requirements.txt		requirements.txt
run_ide_proxy.ps1		run_ide_proxy.ps1
run_interpreter.ps1		run_interpreter.ps1
run_server.ps1		run_server.ps1
setup_ovms.ps1		setup_ovms.ps1
start_server_dynamic.ps1		start_server_dynamic.ps1
task-manager.png		task-manager.png
test_request.json		test_request.json
test_stream.json		test_stream.json
verify_environment.ps1		verify_environment.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Intel Arc AI Server (Local Inference)

⚡ Quick Start

1. New Installation

2. Start the Server (Standard)

3. Start Dynamic Mode (Hot-Swap Ready)

4. Start for IDEs (PhpStorm / VS Code)

5. Configuration (Optional)

6. Changing Models

7. Performance Profiles (graph.pbtxt)

8. Command-Based Model Control

📚 Documentation

🧩 Scripts Overview

⚙️ Technical Specs

About

Uh oh!

Releases

Packages

Languages

codex-corp/intel-arc-ovms-interface

Folders and files

Latest commit

History

Repository files navigation

🚀 Intel Arc AI Server (Local Inference)

⚡ Quick Start

1. New Installation

2. Start the Server (Standard)

3. Start Dynamic Mode (Hot-Swap Ready)

4. Start for IDEs (PhpStorm / VS Code)

5. Configuration (Optional)

6. Changing Models

7. Performance Profiles (graph.pbtxt)

8. Command-Based Model Control

📚 Documentation

🧩 Scripts Overview

⚙️ Technical Specs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages