Multimodal πΌοΈ > ByteDance released SA2VA: a family of vision LMs that can take image, video, text and visual prompts > moondream2 is out with new capabilities like outputting structured data and gaze detection! > Dataset: Alibaba DAMO lab released multimodal textbook β 22k hours worth of samples from instruction videos π€― > Dataset: SciCap captioning on scientific documents benchmark dataset is released along with the challenge!
Embeddings π > @MoritzLaurer released zero-shot version of ModernBERT large π > KaLM is a new family of performant multilingual embedding models with MIT license built using Qwen2-0.5B
Image/Video Generation β―οΈ > NVIDIA released Cosmos, a new family of diffusion/autoregressive World Foundation Models generating worlds from images, videos and texts π₯ > Adobe released TransPixar: a new text-to-video model that can generate assets with transparent backgrounds (a first!) > Dataset: fal released cosmos-openvid-1m Cosmos-tokenized OpenVid-1M with samples from OpenVid-1M
Others > Prior Labs released TabPFNv2, the best tabular transformer is out for classification and regression > Metagene-1 is a new RNA language model that can be used for pathogen detection, zero-shot embedding and genome understanding
All the responses get saved in the cfahlgren1/react-code-instructions dataset. Hopefully we can build one of the biggest, highest quality frontend datasets on the hub πͺ
We enable large language models to generate and understand 3D meshes by representing them as text and fine-tuning. This unifies the 3D and text modalities in a single model and preserves language abilities, unlocking conversational 3D creation with mesh understanding.
After some heated discussion π₯, we clarify our intent re. storage limits on the Hub
TL;DR: - public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible - private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)
We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community π₯
It's 2025, you shouldn't be hand writing SQL! This is a big step in making it where anyone can do in depth analysis on a dataset. Let us know what you think π€
observers π - automatically log all OpenAI compatible requests to a datasetπ½
β’ supports any OpenAI compatible endpoint πͺ β’ supports DuckDB, Hugging Face Datasets, and Argilla as stores
> pip install observers
No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!