Skip to content

klusai/tinyfabulist-tf2

Repository files navigation

TinyFabulist-TF2 — Romanian Fables Translation Benchmark

Goal: Benchmark, fine-tune and quantize Gemma-3-12B for EN→RO translation for fables within a $350 budget.


✨ What’s inside

  • End-to-end pipeline: data prep → translation → scoring → reports
  • Cost modeling (lib/estimate) with budget guards
  • Human-centric metrics (Accuracy, Fluency, Coherence, Style, Cultural/Pragmatic)
  • Model zoo ready: proprietary (GPT, Gemini, DeepL) & open-source (Gemma, Llama, etc.)
  • Fine-tuning + Quantization: LoRA recipes, GGUF exports, and W8A8 (LLM Compressor)

🗺️ Roadmap (v1.0)

  • Cost Analysis 📊 — Done
  • Dataset Creation 🏗️ — Done
  • Benchmarking 🔍 — Done
  • Fine-tuning 🎯 — Done (baseline)
  • Evaluation & Reports 🚀 — Done

📊 Translation Evaluation Benchmarks (100-sample dev unless noted)

Model Accuracy Fluency Coherence Style Cultural/Pragmatic Average Score Count Avg Input Tokens Avg Output Tokens Avg Inference Time (s)
o3-2025-04-16 4.86 4.92 4.89 4.96 4.97 4.92 100 181.3 342.7 20.37
gpt-4.1-2025-04-14 4.86 4.89 4.85 4.92 4.94 4.89 100 181.3 342.7 20.37
gemini-2.5-flash-preview-05-20 4.75 4.86 4.82 4.87 4.89 4.84 100 181.3 342.7 20.37
tf2-12b 4.72 4.88 4.84 4.87 4.85 4.83 100 0.0 0.0 0.00
o3-mini-2025-01-31 4.71 4.78 4.87 4.85 4.92 4.83 100 181.3 342.7 20.37
gemini-2.0-flash-001 4.66 4.82 4.78 4.89 4.93 4.82 100 181.3 342.7 20.37
deepseek-r1 4.72 4.76 4.87 4.85 4.89 4.82 98 183.2 346.0 20.59
tf2-12b-w8a8 4.70 4.86 4.85 4.86 4.83 4.82 100 0.0 0.0 0.00
grok-3-mini-beta 4.73 4.74 4.77 4.82 4.88 4.79 100 181.3 342.7 20.37
gpt-4.1-mini-2025-04-14 4.54 4.71 4.72 4.84 4.83 4.73 98 181.3 342.2 20.35
deepl 4.42 4.73 4.38 4.69 4.74 4.59 100 181.3 342.7 20.37
gemini-flash-1.5-8b 4.14 4.45 4.67 4.52 4.46 4.45 99 181.3 342.6 20.40
gemma-3-12b-it 3.98 4.56 4.65 4.52 4.43 4.43 100 0.0 0.0 0.00
EuroLLM-9B-Instruct 3.84 4.27 4.36 4.27 4.22 4.19 98 0.0 0.0 0.00
qwen3-14b 2.63 3.13 3.40 3.02 2.84 3.00 99 183.1 346.2 20.58

🧩 Data

  • Source: 3M English fables from https://huggingface.co/datasets/klusai/ds-tf1-en-3m
  • Ground truth: GPT-o3 EN→RO

🧮 Cost Model (included)

  • Budget target: $300
  • Estimator: lib/estimate with per-model pricing + token usage priors
  • Example infra note: sfcompute, 8× H100, ~$1.35 per GPU·h (August 2025) → effective cluster ≈ $10.8/h

🧪 Fine-tuning & Quantization Artifacts

Deploy options: llama.cpp (GGUF), vLLM (W8A8), or standard 🤗 Transformers.


✅ Success Criteria

  • LLM Evaluator Metrics within 98% of O3 baseline on test set
  • Total cost under $300
  • Inference cost < $0.001 / translation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •