Optimum-Intel

community

https://github.com/huggingface/optimum-intel

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

IlyasMoutawwakil updated a Space about 1 month ago

optimum-intel/benchmark-openvino

IlyasMoutawwakil updated a Space 7 months ago

optimum-intel/fastrag-e2e

IlyasMoutawwakil updated a model 7 months ago

optimum-intel/fastrag-ranker

View all activity

optimum-intel's activity

jeffboudier

posted an update 4 days ago

Post

471

NVIDIA just announced the Cosmos World Foundation Models, available on the Hub: nvidia/cosmos-6751e884dc10e013a0a0d8e6

Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development.
The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6

Learn more in this great community article by @mingyuliutw and @PranjaliJoshi https://huggingface.co/blog/mingyuliutw/nvidia-cosmos

1 reply

IlyasMoutawwakil

updated a Space about 1 month ago

Running

🏋️

OpenVINO Benchmark

jeffboudier

posted an update about 2 months ago

Post

1010

New - add your bluesky account to your HF profile:
https://huggingface.co/settings/profile

Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social

By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

jeffboudier

posted an update 3 months ago

Post

1089

This week in Inference Endpoints - thx @erikkaum for the update!

👀 https://huggingface.co/blog/erikkaum/endpoints-changelog

1 reply

jeffboudier

posted an update 4 months ago

Post

455

Inference Endpoints got a bunch of cool updates yesterday, this is my top 3

jeffboudier

posted an update 4 months ago

Post

4038

Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!

In this short video I show how to set it up

3 replies

derek-thomas

posted an update 5 months ago

Post

2164

Here is an AI Puzzle!
When you solve it just use a 😎 emoji.
NO SPOILERS
A similar puzzle might have each picture that has a hidden meaning of summer, winter, fall, spring, and the answer would be seasons.

Its a little dated now (almost a year), so bottom right might be tough.

Thanks to @johko for the encouragement to post!

IlyasMoutawwakil

updated a Space 7 months ago

Runtime error

👩‍🌾

FastRAG Haystack Pipeline

IlyasMoutawwakil

updated 2 models 7 months ago

optimum-intel/fastrag-ranker

Updated Jun 25, 2024

optimum-intel/fastrag-embedder

Updated Jun 25, 2024

IlyasMoutawwakil

posted an update 7 months ago

Post

4008

Last week, Intel's new Xeon CPUs, Sapphire Rapids (SPR), landed on Inference Endpoints and I think they got the potential to reduce the cost of your RAG pipelines 💸

Why ? Because they come with Intel® AMX support, which is a set of instructions that support and accelerate BF16 and INT8 matrix multiplications on CPU ⚡

I went ahead and built a Space to showcase how to efficiently deploy embedding models on SPR for both Retrieving and Ranking documents, with Haystack compatible components: https://huggingface.co/spaces/optimum-intel/haystack-e2e

Here's how it works:

- Document Store: A FAISS document store containing the seven-wonders dataset, embedded, indexed and stored on the Space's persistent storage to avoid unnecessary re-computation of embeddings.

- Retriever: It embeds the query at runtime and retrieves from the dataset N documents that are most semantically similar to the query's embedding.
We use the small variant of the BGE family here because we want a model that's fast to run on the entire dataset and has a small embedding space for fast similarity search. Specifically we use an INT8 quantized bge-small-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

- Ranker: It re-embeds the retrieved documents at runtime and re-ranks them based on semantic similarity to the query's embedding. We use the large variant of the BGE family here because it's optimized for accuracy allowing us to filter the most relevant k documents that we'll use in the LLM prompt. Specifically we use an INT8 quantized bge-large-en-v1.5, deployed on an Intel Sapphire Rapids CPU instance.

Space: https://huggingface.co/spaces/optimum-intel/haystack-e2e
Retriever IE: optimum-intel/fastrag-retriever
Ranker IE: optimum-intel/fastrag-ranker

IlyasMoutawwakil

updated a model 7 months ago

optimum-intel/fastrag-retriever

Updated Jun 3, 2024

jeffboudier

posted an update 8 months ago

Post

1692

TGI v2.0.2 is out!
- New models (idefics2, phi3)
- Cleaner VLM support in the openai layer
- Upgraded to pytorch 2.3.0

https://github.com/huggingface/text-generation-inference/releases/tag/v2.0.2

Kudos @Narsil @olivierdehaene @drbh and so many contributors!

jeffboudier

posted an update 9 months ago

Post

1876

These 15 open models are available for serverless inference on Cloudflare Workers AI, powered by GPUs distributed in 150 datacenters globally - 👏 @rita3ko @mchenco @jtkipp @nkothariCF @philschmid

Cloudflare/hf-curated-models-available-on-workers-ai-66036e7ad5064318b3e45db6

derek-thomas

posted an update 11 months ago

Post

Defending LLMs against Jailbreaking Attacks via Backtranslation (2402.16459)
**Defending LLMs against Jailbreaking Attacks via Backtranslation**

I really love this! Its a really innovative way to get robust defense against jailbreaking. Its not cheap, 2-3 calls per user request. But for some use-cases it can be worth it!

derek-thomas

authored a paper over 1 year ago

Relation Extraction with Self-determined Graph Convolutional Network

Paper • 2008.00441 • Published Aug 2, 2020

echarlaix

authored a paper over 1 year ago

Block Pruning For Faster Transformers

Paper • 2109.04838 • Published Sep 10, 2021 • 2

AI & ML interests

Recent Activity

Team members 5

optimum-intel's activity

OpenVINO Benchmark

FastRAG Haystack Pipeline