Lj V. Miranda's picture

Lj V. Miranda

ljvmiranda921

·

https://ljvmiranda921.github.io

AI & ML interests

NLP - multilinguality, data-centric AI

Recent Activity

upvoted a paper 4 days ago

Bridging the Data Provenance Gap Across Text, Speech and Video

upvoted a paper 4 days ago

2 OLMo 2 Furious

updated a collection 4 days ago

calamanCy models for Tagalog NLP

View all activity

Organizations

ljvmiranda921's activity

upvoted 2 papers 4 days ago

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published 24 days ago • 8

2 OLMo 2 Furious

Paper • 2501.00656 • Published 11 days ago • 15

upvoted a paper 22 days ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 23 days ago • 339

upvoted 2 collections about 1 month ago

Multilingual LLM Evaluation

Multilingual Evaluation Benchmarks • 6 items • Updated 29 days ago • 9

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark S

SEACrowd is a community movement project aimed at centralizing and standardizing AI resources for Southeast Asian languages, cultures, and/or regions. • 3 items • Updated Jun 18, 2024 • 6

upvoted a collection about 2 months ago

OLMo 2

Artifacts for the second set of OLMo models. • 22 items • Updated 5 days ago • 74

upvoted a paper about 2 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 58

upvoted a collection about 2 months ago

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 32 items • Updated 5 days ago • 64

upvoted a paper 2 months ago

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Paper • 2410.19133 • Published Oct 24, 2024 • 11

upvoted a collection 3 months ago

Multilingual RewardBench

Multilingual Reward Model Evaluation Dataset and Results • 3 items • Updated about 20 hours ago • 4

upvoted a paper 3 months ago

M-RewardBench: Evaluating Reward Models in Multilingual Settings

Paper • 2410.15522 • Published Oct 20, 2024 • 11

upvoted 2 papers 5 months ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29, 2024 • 56

Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 12

upvoted a collection 6 months ago

Reward Bench

Datasets, spaces, and models for the reward model benchmark! • 5 items • Updated 5 days ago • 9

upvoted 2 papers 7 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12, 2024 • 66

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Paper • 2406.10118 • Published Jun 14, 2024 • 31

upvoted a collection about 1 year ago

State-of-the-Art NER models - Tagalog

2 items • Updated Feb 27, 2024 • 2

upvoted 2 papers about 1 year ago

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Paper • 2311.09122 • Published Nov 15, 2023 • 7

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 27