Skip to content

bryanwee023/bombsquig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bombsquig

Bombsquig, is an LLM inference engine built in Rust, designed for Apple Silicon Macs.

💣 Bombsquig is short for Because I Only have a MacBook, Super Qrabby Uber-cool Inference enGine!

Bombsquig is not intended to be a production-grade LLM inference engine. The project is my personal deep dive into the LLM inference stack, by building each layer from scratch.

From handwritten kernels, to tensor operations, to transformer implementation, each level of abstraction is deliberately in-house to explicitly understand the different components that make up LLM inference, alongside their performance characteristics and runtime challenges.

Existing runtimes solve inference at scale. Bombsquig exists (for me) to understand how they work.

Target Platform

Apple Silicon Macs (M1, M2, etc.), or any Aarch64 machine that supports Apple Metal.

Features

Bombsquig is actively being developed, with the following features:

  • Built in Rust for safety and performance
  • Uses NEON vectorization for tensor operations on CPU
  • Uses Apple Metal for tensor operations on GPU
  • Naive KV Cache for autoregressive decoding

As of now, Bombsquig only supports the Phi-3 mini model (3.8B parameters), but aims to be extensible to other models in the future.

Future features may include:

  • Support for more models (LLaMA-style, DeepSeek, etc.)
  • Quantization support (8-bit, 4-bit, etc.)
  • Flash Attention
  • Advanced KV Cache management (e.g., paged attention)
  • In-house tokenizer implementation
  • Batching optimization

About

LLM inference engine for Apple Silicon Macs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages