Bombsquig, is an LLM inference engine built in Rust, designed for Apple Silicon Macs.
💣 Bombsquig is short for Because I Only have a MacBook, Super Qrabby Uber-cool Inference enGine!
Bombsquig is not intended to be a production-grade LLM inference engine. The project is my personal deep dive into the LLM inference stack, by building each layer from scratch.
From handwritten kernels, to tensor operations, to transformer implementation, each level of abstraction is deliberately in-house to explicitly understand the different components that make up LLM inference, alongside their performance characteristics and runtime challenges.
Existing runtimes solve inference at scale. Bombsquig exists (for me) to understand how they work.
Apple Silicon Macs (M1, M2, etc.), or any Aarch64 machine that supports Apple Metal.
Bombsquig is actively being developed, with the following features:
- Built in Rust for safety and performance
- Uses NEON vectorization for tensor operations on CPU
- Uses Apple Metal for tensor operations on GPU
- Naive KV Cache for autoregressive decoding
As of now, Bombsquig only supports the Phi-3 mini model (3.8B parameters), but aims to be extensible to other models in the future.
Future features may include:
- Support for more models (LLaMA-style, DeepSeek, etc.)
- Quantization support (8-bit, 4-bit, etc.)
- Flash Attention
- Advanced KV Cache management (e.g., paged attention)
- In-house tokenizer implementation
- Batching optimization