#

cpu-inference

Here are 26 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Updated Nov 6, 2023
Python

CoderLSF / fast-llama

Runs LLaMA with Extremely HIGH speed

llama inference-engine cpu-inference llama2

Updated Nov 21, 2023
C++

rbitr / llm.f90

LLM inference in Fortran

ai chatbot transformer llama language-model mamba state-space-model cpu-inference llm llamacpp llama2 phi-2

Updated May 30, 2024
Fortran

homelab

jozsefszalma / homelab

The bare metal in my basement

machine-learning ai deep-learning server gpu hobby-project bare-metal homelab hardware-hacking cpu-inference

Updated Nov 10, 2024

yybit / pllm

Portable LLM - A rust library for LLM inference

cpu-inference aigc llm llama2

Updated Apr 13, 2024
Rust

lucienhuangfu / eLLM

eLLM Infers Qwen3-480B on a CPU in Real Time

llama cpu-inference deep-thinking llm-infernece deep-research context-engineering rust-llm

Updated Sep 11, 2025
Rust

laelhalawani / gguf_llama

Wrapper for simplified use of Llama2 GGUF quantized models.

llama quantization cpu-inference llamacpp llama2 gguf

Updated Jan 14, 2024
Python

JohnClaw / chatllm.v

V-lang api wrapper for llm-inference chatllm.cpp

chatbot inference bindings api-wrapper llama quantization gemma mistral v-lang vlang cpu-inference llm llms chatllm ggml llm-inference qwen phi3

Updated Nov 20, 2024
C

codito / arey

Simple large language model playground app

cli ai mistral cpu-inference large-language-models llm local-model llamacpp llama2 ollama gguf

Updated Sep 8, 2025
Rust

JohnClaw / chatllm.vb

VB.NET api wrapper for llm-inference chatllm.cpp

bindings api-wrapper llama vb-net vbnet gemma mistral int8 int8-inference int8-quantization cpu-inference chatllm ggml llm-inference qwen

Updated Nov 26, 2024
Visual Basic .NET

JohnClaw / chatllm.cs

C# api wrapper for llm-inference chatllm.cpp

csharp inference bindings api-wrapper llama gemma mistral int8 int8-inference int8-quantization cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 20, 2024
C#

Nishant1998 / PlantAi

PlantAi is a ResNet-based CNN model trained on the PlantVillage dataset to classify plant leaf images as healthy or diseased. This repository includes PyTorch training code, tools to convert the model to TensorFlow Lite (TFLite) for deployment, and an Android app integrating the model for real-time leaf disease detection from camera images.

android java deep-neural-networks computer-vision deep-learning cnn image-classification resnet onnx pytoch cpu-inference tflight real-time-inference agriculture-ai

Updated Aug 21, 2025
Java

BjornMelin / local-llm-workbench

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

cuda gpu-acceleration model-management inference-optimization model-quantization cpu-inference llama-cpp local-llm llm-deployment llm-benchmarking ollama-optimization hybrid-inference wsl-ai-setup context-window-scaling

Updated Mar 27, 2025
Shell

JohnClaw / chatllm.nim

Nim api-wrapper for llm-inference chatllm.cpp

Updated Nov 20, 2024
C

lahcenkh / rag-network-docs

Privacy-focused RAG chatbot for network documentation. Chat with your PDFs locally using Ollama, Chroma & LangChain. CPU-only, fully offline.

ai python3 network-programming cpu-inference vector-database-embedding rag-chatbot

Updated Sep 7, 2025
Python

chinese-soup / cbot-telegram-whisper

Simple bot that transcribes Telegram voice messages. Powered by go-telegram-bot-api & whisper.cpp Go bindings.

bot golang speech-recognition openai speech-to-text whisper cpu-inference whisper-cpp whispercpp

Updated Apr 19, 2023
Go

JohnClaw / chatllm.rs

rust api wrapper for llm-inference chatllm.cpp

rust chatbot inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 27, 2024
Rust

JohnClaw / chatllm.lua

lua api wrapper for llm-inference chatllm.cpp

lua chatbot luajit inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 26, 2024
Lua

JohnClaw / gemma-2-2b-it.cs

gemma-2-2b-it int8 cpu inference in one file of pure C#

csharp inference quantization gemma int8 inference-engine model-serving int8-inference int8-quantization cpu-inference llm llms llm-serving llm-inference gemma2 gemma2-2b-it

Updated Jun 14, 2025
C#

JohnClaw / chatllm.kt

kotlin api wrapper for llm-inference chatllm.cpp

kotlin chatbot inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 26, 2024
C

Improve this page

Add a description, image, and links to the cpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cpu-inference topic, visit your repo's landing page and select "manage topics."