๐ Exciting news! We've just launched "Thundermoon" - the latest version of Moondream, our open-source vision language model! ๐
Key improvements in this release: 1. Massive leap in OCR capabilities 2. Enhanced document understanding 3. Significant boosts across key metrics: * DocVQA: 61.9 (โ103%) * TextVQA: 60.2 (โ5.2%) * GQA: 64.9 (โ2.9%)
What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.
Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!
Why it matters: * Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers. * Efficiency: Proving that smaller models can deliver big results. * Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.
Just released a new version of vikhyatk/moondream2 - now supporting higher resolution images (up to 756x756)!
TextVQA score (which measures the model's ability to read and reason about text in images) is up from 53.1 to 57.2 (+7.7%). Other visual question answering and counting benchmark results are up ~0.5%.
The VeCLIP paper showed a +3% gain while only using 14% of the data by synthetically captioning like this. You get diversity from the alt text (middle column) without having to deal with all of the noise.
Released a new version of vikhyatk/moondream2 today! Primarily focused on improving OCR and captioning (e.g. "Describe this image", "Describe this image in one sentence"), but also seeing general improvement across all benchmarks.
Just released moondream2 - a small 1.8B parameter vision language model. Now fully open source (Apache 2.0) so you can use it without restrictions on commercial use!