C4AI-Community (C4AI Community)

posted an update about 13 hours ago

Post

456

Groundbreaking Survey on Large Language Models in Recommendation Systems!

Just read a comprehensive survey that maps out how LLMs are revolutionizing recommender systems. The authors have meticulously categorized existing approaches into two major paradigms:

Discriminative LLMs for Recommendation:
- Leverages BERT-like models for understanding user-item interactions
- Uses fine-tuning and prompt tuning to adapt pre-trained models
- Excels at tasks like user representation learning and ranking

Generative LLMs for Recommendation:
- Employs GPT-style models to directly generate recommendations
- Implements innovative techniques like in-context learning and zero-shot recommendation
- Supports natural language interaction and explanation generation

Key Technical Insights:
- Novel taxonomy of modeling paradigms: LLM Embeddings + RS, LLM Tokens + RS, and LLM as RS
- Integration methods spanning from simple prompting to sophisticated instruction tuning
- Hybrid approaches combining collaborative filtering with LLM capabilities
- Advanced prompt engineering techniques for controlled recommendation generation

Critical Challenges Identified:
- Position and popularity bias in LLM recommendations
- Limited context length affecting user history processing
- Need for better evaluation metrics for generative recommendations
- Controlled output generation and personalization challenges

This work opens exciting possibilities for next-gen recommendation systems while highlighting crucial areas for future research.

1 reply

·

Aryan8912

updated a collection about 22 hours ago

Multilingual RewardBench

Collection

Multilingual Reward Model Evaluation Dataset and Results • 2 items • Updated 14 minutes ago • 4

yihaopeng

authored a paper 2 days ago

AutoPresent: Designing Structured Visuals from Scratch

Paper • 2501.00912 • Published 10 days ago • 8

prithivMLmods

posted an update 2 days ago

Post

2314

200+ f{🤗} on Stranger Zone! [ https://huggingface.co/strangerzonehf ]

❤️‍🔥Stranger Zone's MidJourney Mix Model Adapter is trending on the Very Model Page, with over 45,000+ downloads. Additionally, the Super Realism Model Adapter has over 52,000+ downloads, remains the top two adapter on Stranger Zone!
strangerzonehf/Flux-Midjourney-Mix2-LoRA, strangerzonehf/Flux-Super-Realism-LoRA

👽Try Demo: prithivMLmods/FLUX-LoRA-DLC

📦Most Recent Adapters to Check Out :
+ Ctoon : strangerzonehf/Ctoon-Plus-Plus
+ Cardboard : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Claude Art : strangerzonehf/Flux-Claude-Art
+ Flay Lay : strangerzonehf/Flux-FlatLay-LoRA
+ Smiley Portrait : strangerzonehf/Flux-Smiley-Portrait-LoRA

🤗Thanks for Community & OPEN SOURCEEE !!

6 replies

·

singhsidhukuldeep

posted an update 3 days ago

Post

1385

Groundbreaking Research Alert: Correctness ≠ Faithfulness in RAG Systems

Fascinating new research from L3S Research Center, University of Amsterdam, and TU Delft reveals a critical insight into Retrieval Augmented Generation (RAG) systems. The study exposes that up to 57% of citations in RAG systems could be unfaithful, despite being technically correct.

>> Key Technical Insights:

Post-rationalization Problem
The researchers discovered that RAG systems often engage in "post-rationalization" - where models first generate answers from their parametric memory and then search for supporting evidence afterward. This means that while citations may be correct, they don't reflect the actual reasoning process.

Experimental Design
The team used Command-R+ (104B parameters) with 4-bit quantization on NVIDIA A100 GPU, testing on the NaturalQuestions dataset. They employed BM25 for initial retrieval and ColBERT v2 for reranking.

Attribution Framework
The research introduces a comprehensive framework for evaluating RAG systems across multiple dimensions:
- Citation Correctness: Whether cited documents support the claims
- Citation Faithfulness: Whether citations reflect actual model reasoning
- Citation Appropriateness: Relevance and meaningfulness of citations
- Citation Comprehensiveness: Coverage of key points

Under the Hood
The system processes involve:
1. Document relevance prediction
2. Citation prediction
3. Answer generation without citations
4. Answer generation with citations

This work fundamentally challenges our understanding of RAG systems and highlights the need for more robust evaluation metrics in AI systems that claim to provide verifiable information.

2 replies

·

Tonic

posted an update 3 days ago

Post

1498

microsoft just released Phi-4 , check it out here : Tonic/Phi-4

hope you like it :-)

alielfilali01

posted an update 4 days ago

Post

1702

3C3H AraGen Leaderboard welcomes today deepseek-ai/DeepSeek-V3 and 12 other models (including the late gpt-3.5 💀) to the ranking of best LLMs in Arabic !

Observations:
- DeepSeek-v3 ranked 3rd and only Open model among the top 5 !

- A 14B open model ( Qwen/Qwen2.5-14B-Instruct) outperforms gpt-3.5-turbo-0125 (from last year). This shows how much we came in advancing and supporting Arabic presence within the LLM ecosystem !

- Contrary to what observed in likelihood-acc leaderboards (like OALL/Open-Arabic-LLM-Leaderboard) further finetuned models like maldv/Qwentile2.5-32B-Instruct actually decreased the performance compared to the original model Qwen/Qwen2.5-32B-Instruct.
It's worth to note that the decrease is statiscally insignificant which imply that at best, the out-domain finetuning do not really hurts the model original capabilities acquired during pretraining.
Previous work addressed this (finetuning VS pretraining) but more investigation in this regard is required (any PhDs here ? This could be your question ...)

Check out the latest rankings: inceptionai/AraGen-Leaderboard

ljvmiranda921

authored 2 papers 5 days ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published 24 days ago • 8

2 OLMo 2 Furious

Paper • 2501.00656 • Published 11 days ago • 15

prithivMLmods

posted an update 5 days ago

Post

5364

Reasoning SmolLM2 🚀

🎯Fine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.

🔥Blog : https://huggingface.co/blog/prithivMLmods/smollm2-ft

🔼 Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF

🤠 Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M

singhsidhukuldeep

posted an update 6 days ago

Post

3358

Exciting breakthrough in e-commerce recommendation systems!
Walmart Global Tech researchers have developed a novel Triple Modality Fusion (TMF) framework that revolutionizes how we make product recommendations.

>> Key Innovation
The framework ingeniously combines three distinct data types:
- Visual data to capture product aesthetics and context
- Textual information for detailed product features
- Graph data to understand complex user-item relationships

>> Technical Architecture
The system leverages a Large Language Model (Llama2-7B) as its backbone and introduces several sophisticated components:

Modality Fusion Module
- All-Modality Self-Attention (AMSA) for unified representation
- Cross-Modality Attention (CMA) mechanism for deep feature integration
- Custom FFN adapters to align different modality embeddings

Advanced Training Strategy
- Curriculum learning approach with three complexity levels
- Parameter-Efficient Fine-Tuning using LoRA
- Special token system for behavior and item representation

>> Real-World Impact
The results are remarkable:
- 38.25% improvement in Electronics recommendations
- 43.09% boost in Sports category accuracy
- Significantly higher human evaluation scores compared to traditional methods

Currently deployed in Walmart's production environment, this research demonstrates how combining multiple data modalities with advanced LLM architectures can dramatically improve recommendation accuracy and user satisfaction.

2 replies

·

singhsidhukuldeep

posted an update 7 days ago

Post

3009

Groundbreaking Research Alert: Rethinking RAG with Cache-Augmented Generation (CAG)

Researchers from National Chengchi University and Academia Sinica have introduced a paradigm-shifting approach that challenges the conventional wisdom of Retrieval-Augmented Generation (RAG).

Instead of the traditional retrieve-then-generate pipeline, their innovative Cache-Augmented Generation (CAG) framework preloads documents and precomputes key-value caches, eliminating the need for real-time retrieval during inference.

Technical Deep Dive:
- CAG preloads external knowledge and precomputes KV caches, storing them for future use
- The system processes documents only once, regardless of subsequent query volume
- During inference, it loads the precomputed cache alongside user queries, enabling rapid response generation
- The cache reset mechanism allows efficient handling of multiple inference sessions through strategic token truncation

Performance Highlights:
- Achieved superior BERTScore metrics compared to both sparse and dense retrieval RAG systems
- Demonstrated up to 40x faster generation times compared to traditional approaches
- Particularly effective with both SQuAD and HotPotQA datasets, showing robust performance across different knowledge tasks

Why This Matters:
The approach significantly reduces system complexity, eliminates retrieval latency, and mitigates common RAG pipeline errors. As LLMs continue evolving with expanded context windows, this methodology becomes increasingly relevant for knowledge-intensive applications.

prithivMLmods

posted an update 11 days ago

Post

3820

Triangulum Catalogued 🔥💫

🎯Triangulum is a collection of pretrained and instruction-tuned generative models, designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively.

+ Triangulum-10B : prithivMLmods/Triangulum-10B
+ Quants : prithivMLmods/Triangulum-10B-GGUF

+ Triangulum-5B : prithivMLmods/Triangulum-5B
+ Quants : prithivMLmods/Triangulum-5B-GGUF

+ Triangulum-1B : prithivMLmods/Triangulum-1B
+ Quants : prithivMLmods/Triangulum-1B-GGUF

4 replies

·

singhsidhukuldeep

posted an update 11 days ago

Post

1603

Excited to share insights from Walmart's groundbreaking semantic search system that revolutionizes e-commerce product discovery!

The team at Walmart Global Technology(the team that I am a part of 😬) has developed a hybrid retrieval system that combines traditional inverted index search with neural embedding-based search to tackle the challenging problem of tail queries in e-commerce.

Key Technical Highlights:

• The system uses a two-tower BERT architecture where one tower processes queries and another processes product information, generating dense vector representations for semantic matching.

• Product information is enriched by combining titles with key attributes like category, brand, color, and gender using special prefix tokens to help the model distinguish different attribute types.

• The neural model leverages DistilBERT with 6 layers and projects the 768-dimensional embeddings down to 256 dimensions using a linear layer, achieving optimal performance while reducing storage and computation costs.

• To improve model training, they implemented innovative negative sampling techniques combining product category matching and token overlap filtering to identify challenging negative examples.

Production Implementation Details:

• The system uses a managed ANN (Approximate Nearest Neighbor) service to enable fast retrieval, achieving 99% recall@20 with just 13ms latency.

• Query embeddings are cached with preset TTL (Time-To-Live) to reduce latency and costs in production.

• The model is exported to ONNX format and served in Java, with custom optimizations like fixed input shapes and GPU acceleration using NVIDIA T4 processors.

Results:
The system showed significant improvements in both offline metrics and live experiments, with:
- +2.84% improvement in NDCG@10 for human evaluation
- +0.54% lift in Add-to-Cart rates in live A/B testing

This is a fantastic example of how modern NLP techniques can be successfully deployed at scale to solve real-world e-

1 reply

·

mmhamdy

authored a paper 12 days ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published 24 days ago • 8

AhmadMustafa

authored a paper 12 days ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published 24 days ago • 8

alielfilali01

posted an update 12 days ago

Post

1805

~75% on the challenging GPQA with only 40M parameters 🔥🥳

GREAT ACHIEVEMENT ! Or is it ?

This new Work, "Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation", take out the mystery about many models i personally suspected their results. Speacially on leaderboards other than the english one, Like the Open Arabic LLM Leaderbaord OALL/Open-Arabic-LLM-Leaderboard.

The authors of this work, first started by training a model on the GPQA data, which, unsurprisingly, led to the model achieving 100% performance.

Afterward, they trained what they referred to as a 'legitimate' model on legitimate data (MedMCQA). However, they introduced a distillation loss from the earlier, 'cheated' model.

What they discovered was fascinating: the knowledge of GPQA leaked through this distillation loss, even though the legitimate model was never explicitly trained on GPQA during this stage.

This raises important questions about the careful use of distillation in model training, especially when the training data is opaque. As they demonstrated, it’s apparently possible to (intentionally or unintentionally) leak test data through this method.

Find out more: Data Laundering: Artificially Boosting Benchmark Results through Knowledge Distillation (2412.15255)

1 reply

·

singhsidhukuldeep

posted an update 13 days ago

Post

2092

Groundbreaking Research Alert: Revolutionizing Document Ranking with Long-Context LLMs

Researchers from Renmin University of China and Baidu Inc . have introduced a novel approach to document ranking that challenges conventional sliding window methods. Their work demonstrates how long-context Large Language Models can process up to 100 documents simultaneously, achieving superior performance while reducing API costs by 50%.

Key Technical Innovations:
- Full ranking strategy enables processing all passages in a single inference
- Multi-pass sliding window approach for comprehensive listwise label construction
- Importance-aware learning objective that prioritizes top-ranked passage IDs
- Support for context lengths up to 128k tokens using models like LLaMA 3.1-8B-Instruct

Performance Highlights:
- 2.2 point improvement in NDCG@10 metrics
- 29.3% reduction in latency compared to traditional methods
- Significant API cost savings through elimination of redundant passage processing

Under the hood, the system leverages advanced long-context LLMs to perform global interactions among passages, enabling more nuanced relevance assessment. The architecture incorporates a novel importance-aware loss function that assigns differential weights based on passage ranking positions.

The research team's implementation demonstrated remarkable versatility across multiple datasets, including TREC DL and BEIR benchmarks. Their fine-tuned model, RankMistral, showcases the practical viability of full ranking approaches in production environments.

This advancement marks a significant step forward in information retrieval systems, offering both improved accuracy and computational efficiency. The implications for search engines and content recommendation systems are substantial.

singhsidhukuldeep

posted an update 18 days ago

Post

2182

Exciting News in AI: JinaAI Releases JINA-CLIP-v2!

The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal:

🚀 Technical Highlights:
- Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder
- Supports 89 languages with 8,192 token context length
- Processes images up to 512×512 pixels with 14×14 patch size
- Implements FlashAttention2 for text and xFormers for vision processing
- Uses Matryoshka Representation Learning for efficient vector storage

⚡️ Under The Hood:
- Multi-stage training process with progressive resolution scaling (224→384→512)
- Contrastive learning using InfoNCE loss in both directions
- Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs
- Incorporates specialized datasets for document understanding, scientific graphs, and infographics
- Uses hard negative mining with 7 negatives per positive sample

📊 Performance:
- Outperforms previous models on visual document retrieval (52.65% nDCG@5)
- Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark
- Strong multilingual performance across 30 languages
- Maintains performance even with 75% dimension reduction (256D vs 1024D)

🎯 Key Innovation:
The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems!

Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

singhsidhukuldeep

posted an update 20 days ago

Post

1274

Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems!

Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable:

>> Technical Deep Dive

Architecture Overview
• The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context
• Sparse features are processed using learnable embeddings with size based on feature cardinality
• User sequence embeddings are generated using a transformer architecture processing past engagements

Feature Processing Pipeline
• Dense features undergo normalization for numerical stability
• Sparse and embedding features receive L2 normalization
• All features are concatenated into a single feature embedding

Key Innovations
• Implemented parallel MaskNet layers with 3 blocks
• Used projection ratio of 2.0 and output dimension of 512
• Stacked 4 DCNv2 layers on top for higher-order interactions

Performance Improvements
• Achieved +1.42% increase in Homefeed Save Volume
• Boosted Overall Time Spent by +0.39%
• Maintained memory consumption increase to just 5%

>> Industry Constraints Addressed

Memory Management
• Optimized for 60% GPU memory utilization
• Prevented OOM errors while maintaining batch size efficiency

Latency Optimization
• Removed input-output concatenation before MLP
• Reduced hidden layer sizes in MLP
• Achieved zero latency increase while improving performance

System Stability
• Ensured reproducible results across retraining
• Maintained model stability across different data distributions
• Successfully deployed in production environment

This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!

C4AI Community

AI & ML interests

Recent Activity

C4AI-Community's activity

Multilingual RewardBench

AutoPresent: Designing Structured Visuals from Scratch

Bridging the Data Provenance Gap Across Text, Speech and Video

2 OLMo 2 Furious

Bridging the Data Provenance Gap Across Text, Speech and Video

Bridging the Data Provenance Gap Across Text, Speech and Video

AI & ML interests

Recent Activity

Team members 153

C4AI-Community's activity