4. Arcee-Spark - Qwen2 7B (w/ merging) fine-tuned further to beat GPT 3.5 on MT Bench. arcee-ai/Arcee-Spark
5. Gemini Nano out in the wild in Chrome - On device LLM with just 2 lines of code (fully offline)
6. Fal released a fully Open Source GAN based Super-Resolution model (with second version already cooking) fal/AuraSR
7. NYU release Cambrian 1 - Vision Multimodal LLM that beats pretty much all other closed source competition 8-34B model size https://huggingface.co/nyu-visionx
And.. much more like Open LLM Leaderboard got a major update, LYMSYS released Chat Vision Arena, OpenAI released a paper on CriticGPT!
What a lovely week, canβt wait for the next to see what the community is up to! Put it down in comments if I missed something π₯
Hi everyone! I'm Alex, I'm 16, I've been an internship at Hugging Face for a little over a week and I've already learned a lot about using and prompting LLM models. With @victor as tutor I've just finished a space that analyzes your feelings by prompting an LLM chat model. The aim is to extend it so that it can categorize hugging face posts.
I love Depth Anything V2 π Itβs Depth Anything, but scaled with both larger teacher model and a gigantic dataset!
Here's a small TLDR of paper with a lot of findings, experiments and more. I have also created a collection that has the models, the dataset, the demo and CoreML converted model π merve/depth-anything-v2-release-6671902e798cd404513ffbf5
The authors have analyzed Marigold, a diffusion based model against Depth Anything and found out whatβs up with using synthetic images vs real images for MDE:
π Real data has a lot of label noise, inaccurate depth maps (caused by depth sensors missing transparent objects etc) and there are many details overlooked
π Synthetic data have more precise and detailed depth labels and they are truly ground-truth, but thereβs a distribution shift between real and synthetic images, and they have restricted scene coverage
The authors train different image encoders only on synthetic images and find out unless the encoder is very large the model canβt generalize well (but large models generalize inherently anyway) π§ But they still fail encountering real images that have wide distribution in labels (e.g. diverse instances of objects) π₯²
Depth Anything v2 framework is to..
π¦ Train a teacher model based on DINOv2-G based on 595K synthetic images π·οΈ Label 62M real images using teacher model π¦ Train a student model using the real images labelled by teacher Result: 10x faster and more accurate than Marigold!
The authors also construct a new benchmark called DA-2K that is less noisy, highly detailed and more diverse!