Enterprise Explorers

Enterprise
community
Activity Feed

AI & ML interests

Our team builds AI with open models and open source, collaborating privately with security and advanced access controls.

Recent Activity

enterprise-explorers's activity

MoritzLaurerย 
posted an update about 4 hours ago
view post
Post
172
FACTS is a great paper from @GoogleDeepMind on measuring the factuality of LLM outputs. You can now download their prompt templates from @huggingface to improve LLM-based fact-checking yourself!

๐Ÿ“ The paper introduces the FACTS Grounding benchmark for evaluating the factuality of LLM outputs.

๐Ÿค– Fact-checking is automated by an ensemble of LLM judges that verify if a response is fully grounded in a factual reference document.

๐Ÿงช The authors tested different prompt templates on held-out data to ensure their generalization.

๐Ÿ“š It's highly educational to read these templates to learn how frontier labs design prompts and understand their limitations.

๐Ÿ’พ You can now download and reuse these prompt templates via the prompt-templates library!

๐Ÿ”„ The library simplifies sharing prompt templates on the HF hub or locally via standardized YAML files. Letโ€™s make LLM work more transparent and reproducible by sharing more templates like this!

Links ๐Ÿ‘‡
- prompt-templates docs: https://moritzlaurer.github.io/prompt_templates/
- all templates on the HF Hub: MoritzLaurer/facts-grounding-prompts
- FACTS paper: https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf
MoritzLaurerย 
posted an update 2 days ago
view post
Post
1468
The TRL v0.13 release is ๐Ÿ”ฅ! My highlight are the new process reward trainer to train models similar to o1 and tool call support:

๐Ÿง  Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning. Perfect for tasks like stepwise reasoning.

๐Ÿ”€ Model merging: A new callback leverages mergekit to merge models during training, improving performance by blending reference and policy models - optionally pushing merged models to the Hugging Face Hub.

๐Ÿ› ๏ธ Tool call support: TRL preprocessing now supports tool integration, laying the groundwork for agent fine-tuning with examples like dynamic temperature fetching in prompts.

โš–๏ธ Mixture of judges: The new AllTrueJudge combines decisions from multiple binary judges for more nuanced evaluation.

Read the release notes and other resources here ๐Ÿ‘‡
Release: https://github.com/huggingface/trl/releases/tag/v0.13.0
Mergekit: https://github.com/arcee-ai/mergekit
Mixture of judges paper: The Perfect Blend: Redefining RLHF with Mixture of Judges (2409.20370)
andrewrreedย 
posted an update 4 days ago
view post
Post
2526
๐Ÿš€ Supercharge your LLM apps with Langfuse on Hugging Face Spaces!

Langfuse brings end-to-end observability and tooling to accelerate your dev workflow from experiments through production

Now available as a Docker Space directly on the HF Hub! ๐Ÿค—

๐Ÿ” Trace everything: monitor LLM calls, retrieval, and agent actions with popular frameworks
1โƒฃ One-click deployment: on Spaces with persistent storage and integrated OAuth
๐Ÿ›  Simple Prompt Management: Version, edit, and update without redeployment
โœ… Intuitive Evals: Collect user feedback, run model/prompt evaluations, and improve quality
๐Ÿ“Š Dataset Creation: Build datasets directly from production data to enhance future performance

Kudos to the Langfuse team for this collab and the awesome, open-first product theyโ€™re building! ๐Ÿ‘ @marcklingen @Clemo @MJannik

๐Ÿ”— Space: langfuse/langfuse-template-space
๐Ÿ”— Docs: https://huggingface.co/docs/hub/spaces-sdks-docker-langfuse
  • 1 reply
ยท
MoritzLaurerย 
posted an update 4 days ago
view post
Post
2001
OpenAI is losing money on the $200/month subscription ๐Ÿคฏ. It's crazy how expensive it is to run these largest LLMs:

- ChatGPT Pro costs $200/month ($2,400/year) and is still unprofitable for OpenAI due to higher-than-expected usage.
- OpenAI reportedly expected losses of about $5 billion on revenue of $3.7 billion last year, with ChatGPT alone once costing an estimated $700,000 per day to operate. ๐Ÿ’ธ๐Ÿ”ฅ
- They build strong models and do great research. Whether this business model will work in the long run is one of the biggest questions in the AI economy today.

Source with the numbers ๐Ÿ‘‡
https://techcrunch.com/2025/01/05/openai-is-losing-money-on-its-pricey-chatgpt-pro-plan-ceo-sam-altman-says/
ยท
jeffboudierย 
posted an update 4 days ago
view post
Post
470
NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co/blog/mingyuliutw/nvidia-cosmos
  • 1 reply
ยท
MoritzLaurerย 
posted an update 5 days ago
view post
Post
2156
๐Ÿš€ Releasing a new zeroshot-classifier based on ModernBERT! Some key takeaways:

- โšก Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost as well
- ๐Ÿ“‰ Performance tradeoff: It performs slightly worse than DeBERTav3 on average across my zeroshot classification task collection
- ๐Ÿง  Use cases: I recommend using it for scenarios requiring speed and a larger context window (8k).
- ๐Ÿ’ก Whatโ€™s next? Iโ€™m preparing a newer version trained on better + longer synthetic data to fully leverage the 8k context window and improve upon the training mix of my older zeroshot-v2.0 models. I also hope that there will be a multilingual variant in the future.

Great work by https://huggingface.co/answerdotai !

If youโ€™re looking for a high-speed zeroshot classifier, give it a try!

๐Ÿ“„ Resources below: ๐Ÿ‘‡
Base model: MoritzLaurer/ModernBERT-base-zeroshot-v2.0
Large model: MoritzLaurer/ModernBERT-large-zeroshot-v2.0
Updated zeroshot collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
ModernBERT collection with paper: answerdotai/modernbert-67627ad707a4acbf33c41deb
MoritzLaurerย 
posted an update 22 days ago
view post
Post
2597
Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification!

This will probably be the basis for many future SOTA encoders! And I can finally stop using DeBERTav3 from 2021 :D

Congrats @answerdotai , @LightOnIO and collaborators like @tomaarsen !

Paper and models here ๐Ÿ‘‡https://huggingface.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb
ยท
MoritzLaurerย 
posted an update 25 days ago
MoritzLaurerย 
posted an update 30 days ago
view post
Post
1285
I've been building a small library for working with prompt templates on the HF hub: pip install prompt-templates. Motivation:

The community currently shares prompt templates in a wide variety of formats: in datasets, in model cards, as strings in .py files, as .txt/.yaml/.json/.jinja2 files etc. This makes sharing and working with prompt templates unnecessarily complicated.

Prompt templates are currently the main hyperparameter that people tune when building complex LLM systems or agents. If we don't have a common standard for sharing them, we cannot systematically test and improve our systems. After comparing different community approaches, I think that working with modular .yaml or .json files is the best approach.

The prompt-templates library :
- proposes a standard for sharing prompts (entirely locally or on the HF hub)
- provides some utilities that are interoperable with the broader ecosystem

Try it:
# !pip install prompt-templates
from prompt_templates import PromptTemplateLoader 
prompt_template = PromptTemplateLoader.from_hub(repo_id="MoritzLaurer/closed_system_prompts", filename="claude-3-5-artifacts-leak-210624.yaml")


The library is in early stages, feedback is welcome!

More details in the docs: https://github.com/MoritzLaurer/prompt_templates/
  • 1 reply
ยท
julien-cย 
posted an update about 1 month ago
view post
Post
8204
After some heated discussion ๐Ÿ”ฅ, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community ๐Ÿ”ฅ

cc: @reach-vb @pierric @victor and the HF team
ยท
pagezyhfย 
posted an update about 1 month ago
pagezyhfย 
posted an update about 1 month ago
view post
Post
971
Itโ€™s 2nd of December , hereโ€™s your Cyber Monday present ๐ŸŽ !

Weโ€™re cutting our price down on Hugging Face Inference Endpoints and Spaces!

Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3๏ธโƒฃ months. We have other reductions on all instances ranging from 20 to 50%.

Sounds like the time to give Inference Endpoints a try? Get started today and find in our documentation the full pricing details.
https://ui.endpoints.huggingface.co/
https://huggingface.co/pricing
abhishekย 
posted an update about 1 month ago
view post
Post
1773
๐ŸŽ‰ SUPER BLACK FRIDAY DEAL ๐ŸŽ‰

Train almost any model on a variety of tasks such as llm finetuning, text classification/regression, summarization, question answering, image classification/regression, object detection, tabular data, etc for FREE using AutoTrain locally. ๐Ÿ”ฅ
https://github.com/huggingface/autotrain-advanced
julien-cย 
posted an update about 1 month ago
view post
Post
2484
wow ๐Ÿ˜ฎ

INTELLECT-1 is the first collaboratively trained 10 billion parameter language model trained from scratch on 1 trillion tokens of English text and code.

PrimeIntellect/INTELLECT-1-Instruct
victorย 
posted an update about 1 month ago
view post
Post
1944
Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing ๐Ÿคฏ.

Check this dataset for the results: victor/qwq-misguided-attention
  • 2 replies
ยท
pagezyhfย 
posted an update about 2 months ago
view post
Post
302
Hello Hugging Face Community,

if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !

To watch in this order:
1. Learn what are Hugging Face Deep Learning Containers
https://youtu.be/aWMp_hUUa0c?si=t-LPRkRNfD3DDNfr

2. Learn how to deploy a LLM with our Deep Learning Container using Text Generation Inference
https://youtu.be/Q3oyTOU1TMc?si=V6Dv-U1jt1SR97fj

3. Learn how to scale your inference endpoint based on traffic
https://youtu.be/QjLZ5eteDds?si=nDIAirh1r6h2dQMD

If you want more of these small tutorials and have any theme in mind, let me know!
victorย 
posted an update about 2 months ago
view post
Post
2410
Perfect example of why Qwen/Qwen2.5-Coder-32B-Instruct is insane?

Introducing: AI Video Composer ๐Ÿ”ฅ
huggingface-projects/ai-video-composer

Drag and drop your assets (images/videos/audios) to create any video you want using natural language!

It works by asking the model to output a valid FFMPEG and this can be quite complex but most of the time Qwen2.5-Coder-32B gets it right (that thing is a beast). It's an update of an old project made with GPT4 and it was almost impossible to make it work with open models back then (~1.5 years ago), but not anymore, let's go open weights ๐Ÿš€.
andrewrreedย 
posted an update about 2 months ago
view post
Post
981
Trace LLM calls with Arize AI's Phoenix observability dashboards on Hugging Face Spaces! ๐Ÿš€

โœจ I just added a new recipe to the Open-Source AI Cookbook that shows you how to:
1๏ธโƒฃ Deploy Phoenix on HF Spaces with persistent storage in a few clicks
2๏ธโƒฃ Configure LLM tracing with the ๐—ฆ๐—ฒ๐—ฟ๐˜ƒ๐—ฒ๐—ฟ๐—น๐—ฒ๐˜€๐˜€ ๐—œ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—”๐—ฃ๐—œ
3๏ธโƒฃ Observe multi-agent application runs with the CrewAI integration

๐—ข๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—ถ๐˜€ ๐—ฐ๐—ฟ๐˜‚๐—ฐ๐—ถ๐—ฎ๐—น for building robust LLM apps.

Phoenix makes it easy to visualize trace data, evaluate performance, and track down issues. Give it a try!

๐Ÿ”— Cookbook recipe: https://huggingface.co/learn/cookbook/en/phoenix_observability_on_hf_spaces
๐Ÿ”— Phoenix docs: https://docs.arize.com/phoenix
jeffboudierย 
posted an update about 2 months ago
victorย 
posted an update about 2 months ago
view post
Post
1829
Qwen2.5-72B is now the default HuggingChat model.
This model is so good that you must try it! I often get better results on rephrasing with it than Sonnet or GPT-4!!