2 3 20

Ayaan Sharif

Ayaan-Sharif

https://shariif.tech

AI & ML interests

NLP, LLM, TEXT, Languages

Recent Activity

liked a model 4 days ago

cognitivecomputations/dolphin-2.9-llama3-8b

liked a model 4 days ago

cognitivecomputations/Dolphin3.0-Llama3.2-1B

replied to sanchit-gandhi's post 10 days ago

Why does returning timestamps help Whisper reduce hallucinations? 🧐 Empirically, most practitioners have found that setting `return_timestamps=True` helps reduce hallucinations, particularly when doing long-form evaluation with Transformers’ “chunked” algorithm. But why does this work?.. My interpretation is that forcing the model to predict timestamps is contradictory to hallucinations. Suppose you have the transcription: ```markdown The cat sat on the on the on the mat. ``` Where we have a repeated hallucination for “on the”. If we ask the model to predict timestamps, then the “on the” has to contribute to the overall segment-level timing, e.g.: ```markdown <|0.00|> The cat sat on the on the on the mat.<|5.02|> ``` However, it’s impossible to fit 3 copies of “on the” within the time allocation given to the segment, so the probability for this hallucinatory sequence becomes lower, and the model actually predicts the correct transcription with highest probability: ```markdown <|0.00|> The cat sat on the mat.<|5.02|> ``` In this sense, the end timestamp is of the opposite of the initial timestamp constraint they describe in Section 4.5 of the paper https://huggingface.co/papers/2212.04356 → it helps the model remove extra words at the end of the sequence (rather than the initial timestamp which helps when the model ignores words at the start), but the overall principle is the same (using timestamps to improve the probability of more realistic sequences). Leaving it open to you: why do you think timestamps reduces Whisper hallucinations?

View all activity

Organizations

None yet

Ayaan-Sharif's activity

liked 2 models 4 days ago

cognitivecomputations/dolphin-2.9-llama3-8b

Text Generation • Updated May 20, 2024 • 81.1k • 427

cognitivecomputations/Dolphin3.0-Llama3.2-1B

Updated 5 days ago • 256 • 16

replied to sanchit-gandhi's post 10 days ago

what if we segment the audio first and then transcribe tho its some extra compute to throw in but imo it would resul tin better result !

liked 3 Spaces 11 days ago

Running

🚀

Ebook2audiobook V2.0 Beta

Added improvements, 1107+ languages supported

liked a model 15 days ago

huggyllama/llama-7b

Text Generation • Updated Jul 2, 2024 • 123k • 308

commented a paper 17 days ago

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Paper • 2412.10302 • Published 29 days ago • 11 •

liked a model 17 days ago

deepseek-ai/DeepSeek-V3-Base

Updated 13 days ago • 11.2k • 1.22k

upvoted a collection 18 days ago

IndicConformer

Collection

A collection of ASR models for 22 scheduled languages of India • 22 items • Updated Oct 15, 2024 • 5

liked a Space 18 days ago

Running

494

🌍

THUDM/cogvlm2-llama3-caption

Video-Text-to-Text • Updated Sep 26, 2024 • 199k • 76

Neurazum/Xbai-Epilepsy-1.0

Video-Text-to-Text • Updated Nov 11, 2024 • 2

reacted to vladbogo's post with 👍 29 days ago

Post

Panda-70M is a new large-scale video dataset comprising 70 million high-quality video clips, each paired with textual captions, designed to be used as pre-training for video understanding tasks.

Key Points:
* Automatic Caption Generation: Utilizes an automatic pipeline with multiple cross-modality teacher models to generate captions for video clips.
* Fine-tuned Caption Selection: Employs a fine-tuned retrieval model to select the most appropriate caption from multiple candidates for each video clip.
* Improved Performance: Pre-training on Panda-70M shows significant performance gains in video captioning, text-video retrieval, and text-driven video generation.

Paper: Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers (2402.19479)
Project page: https://snap-research.github.io/Panda-70M/
Code: https://github.com/snap-research/Panda-70M

Congrats to the authors @tschen , @aliaksandr-siarohin et al. for their work!