view post Post 414 How do I test an LLM for my unique needs?If you work in finance, law, or medicine, generic benchmarks are not enough.This blog post uses Argilla, Distilllabel and 🌤️Lighteval to generate evaluation dataset and evaluate models.https://github.com/argilla-io/argilla-cookbook/blob/main/domain-eval/README.md
Agents ressources All the ressources I found / used when getting up to speed with agents. GAIA: a benchmark for General AI Assistants Paper • 2311.12983 • Published Nov 21, 2023 • 187
SaylorTwift/details_mistralai__Mistral-7B-Instruct-v0.2_private Viewer • Updated Apr 2, 2024 • 162 • 39