This repository contains sample code and examples to help developers learn how to work with Moondream, the world's most efficient multi-function Vision Language Model (VLM).
Moondream is a state-of-the-art Vision Language Model that combines multiple vision capabilities in a tiny, efficient, and blazingly fast form factor. It's designed to be:
- Efficient: Optimized for performance and resource usage
- Versatile: Supports multiple vision-language tasks
- Fast: Delivers quick inference times
- Compact: Small model size without compromising capabilities
The examples in this project work with Python, Node, and Bash. Most examples are configured to run against Moondream locally using Moondream Station – a free Mac/Ubuntu app, or run against the Moondream Cloud, which offers a free tier.
Browse the examples below. Each one is fully functional and ready to run or modify:
| Example | Description |
|---|---|
| Python | A hello world example showing how to call the model. |
| Car Detection | Notebook demonstrating caption, query, detect, and point while processing car images. Bounding boxes are normalized and converted to pixels for overlays. |
| Node.js | Demonstrates how to interact with Moondream in a Node environment. |
| Modal Labs | Examples showing how to host and use Moondream Station on the cloud using Modal Labs. |
Saw an error, or have an idea on how to improve these examples? We'd love your contribution, Pull Requests are always appreciated.
This project is licensed under the MIT License - see the LICENSE file for details.