Yoonjae Jeong's picture

90 9

Yoonjae Jeong

hybris75

·

AI & ML interests

None yet

Organizations

None yet

hybris75's activity

upvoted a paper 8 months ago

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 73

upvoted 6 papers 9 months ago

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 11

MuPT: A Generative Symbolic Music Pretrained Transformer

Paper • 2404.06393 • Published Apr 9, 2024 • 15

Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11, 2024 • 15

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Paper • 2403.20327 • Published Mar 29, 2024 • 47

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Paper • 2404.03204 • Published Apr 4, 2024 • 7

Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1, 2024 • 16

upvoted 7 papers 10 months ago

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Paper • 2403.17804 • Published Mar 26, 2024 • 16

MusicHiFi: Fast High-Fidelity Stereo Vocoding

Paper • 2403.10493 • Published Mar 15, 2024 • 15

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8, 2024 • 61

Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7, 2024 • 46

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 606

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 34

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 88

upvoted 6 papers 11 months ago

Music Style Transfer with Time-Varying Inversion of Diffusion Models

Paper • 2402.13763 • Published Feb 21, 2024 • 10

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 56

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Paper • 2402.10009 • Published Feb 15, 2024 • 19

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 17

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Paper • 2402.08093 • Published Feb 12, 2024 • 57

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Paper • 2402.07383 • Published Feb 12, 2024 • 13