diff --git a/README.md b/README.md
index 4171c4a..fc2b8bb 100644
--- a/README.md
+++ b/README.md
@@ -1,22 +1,35 @@
# GDB: GraphicDesignBench
-**GDB** evaluates vision-language models on professional graphic design tasks — layout reasoning, typography, SVG editing, template matching, animation. 39 benchmarks across 7 domains, built on the [Lica dataset](https://github.com/lica-world/lica-dataset) (1,148 real design layouts).
+**GDB** evaluates vision-language models on professional graphic design tasks — layout reasoning, typography, SVG editing, template matching, animation. The paper defines 49 evaluation tasks; this repo ships 39 benchmark pipelines covering 45 of them, organized into 7 code-level domains and built on the [Lica dataset](https://github.com/lica-world/lica-dataset) (1,148 real design layouts).
**Paper:** [arXiv:2604.04192](https://arxiv.org/abs/2604.04192) | **Dataset:** [HuggingFace](https://huggingface.co/datasets/lica-world/GDB) | **Blog:** [lica.world](https://lica.world/blog/gdb-real-world-benchmark-for-graphic-design)
## Benchmarks
-Each task is either **understanding** or **generation**:
+Each task is either **understanding** or **generation**. The table counts repo-level
+benchmark pipelines and the paper-level evaluation tasks they score.
-| Domain | Tasks | Benchmarks | Description |
-|--------|------:|----------:|-------------|
+| Repo domain | Benchmarks | Paper tasks | Description |
+|---|--:|--:|---|
| category | 2 | 2 | Design category classification and user intent prediction |
| layout | 8 | 8 | Spatial reasoning over design canvases (aspect ratio, element counting, component type and detection), layout generation (intent-to-layout, partial completion, aspect-ratio adaptation), and layer-aware object insertion (`layout-8`, reference- or description-guided per sample) |
| lottie | 2 | 2 | Lottie animation generation from text and image |
| svg | 8 | 8 | SVG reasoning and editing (perceptual and semantic Q/A, bug fixing, optimization, style editing) and generation (text-to-SVG, image-to-SVG, combined input) |
| template | 5 | 5 | Template matching, retrieval, clustering, and generation (style completion, color transfer) |
-| temporal | 8 | 6 | Keyframe ordering; motion type classification; video/component duration and start-time estimation; generation (animation parameters, motion trajectory, short-form video) |
-| typography | 12 | 8 | Font family, color, size/weight/alignment/letter spacing/line height, style ranges, curvature, rotation, and generation (styled text element, styled text rendering to layout) |
+| temporal | 6 | 8 | Keyframe ordering; motion type classification; video/component duration and start-time estimation; generation (animation parameters, motion trajectory, short-form video) |
+| typography | 8 | 12 | Font family, color, size/weight/alignment/letter spacing/line height, style ranges, curvature, rotation, and generation (styled text element, styled text rendering to layout) |
+| **Totals** | **39** | **45** | |
+
+Benchmarks and paper tasks are not 1:1. Two benchmarks score multiple paper tasks from a
+single model call: `typography-3` extracts font size, weight, alignment, letter spacing,
+and line height as one JSON object (5 paper tasks), and `temporal-3` does the same for
+motion type plus three timing fields (4 paper tasks). This matches how a designer thinks
+about these attributes, and avoids issuing 9 separate prompts per sample.
+
+The paper additionally defines four layout-understanding tasks — layer order
+(`layout-u-5`), image rotation (`layout-u-6`), crop shape (`layout-u-7`), and frame
+detection (`layout-u-8`) — that do not have a runnable pipeline in the repo; see the
+paper for their definitions.
## Setup
@@ -46,9 +59,14 @@ pip install -e ".[dev]" # ruff linter
### Verify
```bash
-python scripts/run_benchmarks.py --list
+gdb verify # zero-config smoke test against a bundled fixture (~30s, no API keys)
+gdb list # enumerate all 39 benchmarks
+gdb suites # named suites: v0-all, v0-smoke, v0-understanding, v0-generation
```
+See the note in `src/gdb/suites.py` on why suites are `v0-*` today and
+what `v1.0-*` will mean once the evaluation definitions are frozen.
+
### Data
Without `--dataset-root`, benchmarks are loaded directly from [HuggingFace](https://huggingface.co/datasets/lica-world/GDB) (requires the `.[hub]` extra). No download step needed.
@@ -65,40 +83,45 @@ Then pass `--dataset-root data/gdb-dataset` to benchmark runs.
```bash
# From HuggingFace (no local data needed)
-python scripts/run_benchmarks.py --stub-model --benchmarks category-1 --n 5
+gdb eval --stub-model --benchmarks category-1 --n 5
# From local data
-python scripts/run_benchmarks.py --stub-model --benchmarks category-1 \
+gdb eval --stub-model --benchmarks category-1 \
--dataset-root data/gdb-dataset --n 5
# Real model
-python scripts/run_benchmarks.py --benchmarks svg-1 \
+gdb eval --benchmarks svg-1 \
+ --provider openai --model-id gpt-5.4 \
+ --dataset-root data/gdb-dataset
+
+# Whole suite
+gdb eval --suite v0-all \
--provider openai --model-id gpt-5.4 \
--dataset-root data/gdb-dataset
# Temporal benchmarks (video-based)
-python scripts/run_benchmarks.py --benchmarks temporal-1 \
+gdb eval --benchmarks temporal-1 \
--provider gemini \
--dataset-root data/gdb-dataset
# User custom python model entrypoint
-python scripts/run_benchmarks.py --benchmarks svg-1 \
+gdb eval --benchmarks svg-1 \
--provider custom --custom-entry my_models.wrapper:build_model \
--custom-init-kwargs '{"checkpoint":"/models/foo"}' \
--dataset-root data/gdb-dataset
# Local default VLM/LLM (defaults to Qwen3-VL-4B-Instruct)
-python scripts/run_benchmarks.py --benchmarks svg-1 \
+gdb eval --benchmarks svg-1 \
--provider hf --device auto \
--dataset-root data/gdb-dataset
# Diffusion / image generation (defaults to FLUX.2 klein 4B)
-python scripts/run_benchmarks.py --benchmarks layout-1 \
+gdb eval --benchmarks layout-1 \
--provider diffusion \
--dataset-root data/gdb-dataset
# Image-generation / editing task with a custom wrapper
-python scripts/run_benchmarks.py --benchmarks typography-7 \
+gdb eval --benchmarks typography-7 \
--provider custom --custom-entry my_models.image_wrapper:build_model \
--custom-modality image_generation \
--dataset-root data/gdb-dataset
@@ -106,12 +129,17 @@ python scripts/run_benchmarks.py --benchmarks typography-7 \
# Official FLUX.2 wrapper via the existing custom provider
python -m pip install --no-deps --ignore-requires-python \
"git+https://github.com/black-forest-labs/flux2.git"
-python scripts/run_benchmarks.py --benchmarks layout-1 layout-3 layout-8 typography-7 typography-8 \
+gdb eval --benchmarks layout-1 layout-3 layout-8 typography-7 typography-8 \
--provider custom \
--custom-entry gdb.models.local_models:Flux2Model \
--custom-init-kwargs '{"model_name":"flux.2-klein-4b"}' \
--custom-modality image_generation \
--dataset-root data/gdb-dataset
+
+# Batch submit (~50% cheaper, fire-and-forget) + collect later
+gdb submit --benchmarks svg-1 --provider gemini --credentials auth/key.json \
+ --dataset-root data/gdb-dataset
+gdb collect jobs/job_manifest.json
```
`--custom-entry` must point to an importable module attribute (installed or reachable via `PYTHONPATH`). For image-output tasks, use `--custom-modality image_generation`.
@@ -145,7 +173,7 @@ export GOOGLE_API_KEY=... # Gemini (Google AI Studio / google-genai A
For **Gemini on Vertex AI** (service account), pass a JSON key file instead of relying on `GOOGLE_API_KEY`:
```bash
-python scripts/run_benchmarks.py --benchmarks svg-1 --provider gemini \
+gdb eval --benchmarks svg-1 --provider gemini \
--credentials /path/to/service-account.json \
--dataset-root data/gdb-dataset
```
@@ -204,9 +232,10 @@ GDB/
│ ├── registry.py # Auto-discovery via pkgutil.walk_packages
│ └── runner.py # BenchmarkRunner orchestration
├── scripts/
-│ ├── download_data.py # Fetch + unpack into gdb-dataset/
-│ ├── run_benchmarks.py # Unified CLI for list, stub, real, and batch runs
-│ └── upload_to_hf.py # Upload dataset to HuggingFace Hub
+│ ├── download_data.py # Fetch + unpack into gdb-dataset/
+│ ├── build_verify_dataset.py # Rebuild the bundled `gdb verify` fixture
+│ ├── run_benchmarks.py # Deprecated; kept as a shim for existing scripts
+│ └── upload_to_hf.py # Upload dataset to HuggingFace Hub
├── integrations/
│ └── helm/ # HELM plugin (lica-gdb-helm on PyPI)
├── docs/
diff --git a/pyproject.toml b/pyproject.toml
index 0eee8b1..7037324 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "lica-gdb"
-version = "0.1.1"
+version = "0.2.0"
description = "GDB: GraphicDesignBench — benchmark suite for evaluating vision-language models on graphic design tasks"
readme = "README.md"
license = {text = "Apache-2.0"}
@@ -13,6 +13,9 @@ dependencies = [
"numpy>=1.24",
]
+[project.scripts]
+gdb = "gdb.cli:main"
+
[project.urls]
Homepage = "https://github.com/lica-world/GDB"
Repository = "https://github.com/lica-world/GDB"
@@ -80,6 +83,15 @@ where = ["src"]
[tool.setuptools.package-data]
"*" = ["*.json"]
+"gdb" = [
+ "_verify_data/README.md",
+ "_verify_data/benchmarks/**/*.csv",
+ "_verify_data/benchmarks/**/*.json",
+ "_verify_data/benchmarks/**/*.png",
+ "_verify_data/benchmarks/**/*.svg",
+ "_verify_data/lica-data/**/*.json",
+ "_verify_data/lica-data/**/*.png",
+]
[tool.ruff]
line-length = 100
diff --git a/scripts/build_verify_dataset.py b/scripts/build_verify_dataset.py
new file mode 100644
index 0000000..b9b60ac
--- /dev/null
+++ b/scripts/build_verify_dataset.py
@@ -0,0 +1,309 @@
+#!/usr/bin/env python3
+"""Build the tiny ``_verify_data/`` fixture bundled inside the ``gdb`` package.
+
+Run from the repo root **when you change the smoke suite or the sample formats**::
+
+ python scripts/build_verify_dataset.py
+
+The output lives at ``src/gdb/_verify_data/`` and is committed to the repo so
+that ``pip install lica-gdb && gdb verify`` works with no downloads and no
+API keys.
+
+The fixture covers the ``v0-smoke`` suite only:
+
+* category-1, layout-4, layout-5, typography-1 (CSV-based tasks)
+* svg-1 (JSON + assets/)
+* template-1 (JSON + lica-data/)
+
+Images are downsampled to ``MAX_PX`` on the longest side to keep the wheel
+small. Scores produced against this fixture are **meaningless** — the stub
+model predicts empty strings — so ``gdb verify`` is strictly an install-time
+smoke test, not a benchmark run.
+"""
+
+from __future__ import annotations
+
+import argparse
+import csv
+import json
+import shutil
+from pathlib import Path
+from typing import Dict, Iterable, List, Tuple
+
+from PIL import Image
+
+SOURCE_ROOT = Path("data/gdb-dataset")
+DEST_ROOT = Path("src/gdb/_verify_data")
+
+# Samples per task kept in the fixture. 2 is enough to exercise the full
+# load → predict → score path without blowing up wheel size.
+N_PER_TASK = 2
+
+# Longest-edge cap for bundled images. The stub model doesn't care about
+# content; at ~6 PNGs this keeps the fixture well under 100 KB.
+MAX_PX = 128
+
+
+# ----------------------------------------------------------------------------
+# Shared helpers
+# ----------------------------------------------------------------------------
+
+
+def _downsample_image(src: Path, dest: Path) -> None:
+ """Copy ``src`` to ``dest`` downsampled to ``MAX_PX`` on the long edge."""
+ dest.parent.mkdir(parents=True, exist_ok=True)
+ with Image.open(src) as im:
+ im = im.convert("RGB")
+ im.thumbnail((MAX_PX, MAX_PX))
+ im.save(dest, format="PNG", optimize=True)
+
+
+def _copy_text(src: Path, dest: Path) -> None:
+ dest.parent.mkdir(parents=True, exist_ok=True)
+ dest.write_text(src.read_text(encoding="utf-8"), encoding="utf-8")
+
+
+# ----------------------------------------------------------------------------
+# Simple CSV tasks (one image per row)
+# ----------------------------------------------------------------------------
+
+_CSV_TASKS: List[Tuple[str, str, str]] = [
+ # (task_id, source subpath under benchmarks/, dest subpath under benchmarks/)
+ ("category-1", "category/CategoryClassification", "category/CategoryClassification"),
+ ("layout-4", "layout/AspectRatioClassification", "layout/AspectRatioClassification"),
+ ("layout-5", "layout/ComponentCount", "layout/ComponentCount"),
+ ("typography-1", "typography/FontFamilyClassification","typography/FontFamilyClassification"),
+]
+
+
+def _build_csv_task(task_id: str, src_subpath: str, dest_subpath: str) -> None:
+ src_csv = SOURCE_ROOT / "benchmarks" / src_subpath / "samples.csv"
+ dest_csv = DEST_ROOT / "benchmarks" / dest_subpath / "samples.csv"
+ dest_csv.parent.mkdir(parents=True, exist_ok=True)
+
+ with src_csv.open("r", encoding="utf-8", newline="") as f:
+ reader = csv.DictReader(f)
+ fieldnames = reader.fieldnames
+ kept: List[Dict[str, str]] = []
+ for row in reader:
+ img_rel = row.get("image_path", "")
+ if not img_rel:
+ continue
+ src_img = SOURCE_ROOT / img_rel
+ if not src_img.is_file():
+ continue
+ dest_img = DEST_ROOT / img_rel
+ _downsample_image(src_img, dest_img)
+ kept.append(row)
+ if len(kept) >= N_PER_TASK:
+ break
+
+ if not kept:
+ raise RuntimeError(f"No usable samples found for {task_id} in {src_csv}")
+
+ with dest_csv.open("w", encoding="utf-8", newline="") as f:
+ writer = csv.DictWriter(f, fieldnames=fieldnames)
+ writer.writeheader()
+ writer.writerows(kept)
+ print(f" {task_id}: {len(kept)} samples → {dest_csv.relative_to(DEST_ROOT)}")
+
+
+# ----------------------------------------------------------------------------
+# svg-1 (JSON + per-record PNG + per-record SVG)
+# ----------------------------------------------------------------------------
+
+
+def _build_svg1() -> None:
+ src_json = SOURCE_ROOT / "benchmarks/svg/svg-1.json"
+ dest_json = DEST_ROOT / "benchmarks/svg/svg-1.json"
+ records_full = json.loads(src_json.read_text(encoding="utf-8"))
+
+ kept: List[dict] = []
+ for rec in records_full:
+ img_rel = rec.get("image_path", "")
+ svg_rel = rec.get("svg_path", "")
+ if not img_rel or not svg_rel:
+ continue
+ src_img = SOURCE_ROOT / "benchmarks/svg" / img_rel
+ src_svg = SOURCE_ROOT / "benchmarks/svg" / svg_rel
+ if not src_img.is_file() or not src_svg.is_file():
+ continue
+ dest_img = DEST_ROOT / "benchmarks/svg" / img_rel
+ dest_svg = DEST_ROOT / "benchmarks/svg" / svg_rel
+ _downsample_image(src_img, dest_img)
+ _copy_text(src_svg, dest_svg)
+ kept.append(rec)
+ if len(kept) >= N_PER_TASK:
+ break
+
+ if not kept:
+ raise RuntimeError("No usable svg-1 records found")
+
+ dest_json.parent.mkdir(parents=True, exist_ok=True)
+ dest_json.write_text(json.dumps(kept, indent=2), encoding="utf-8")
+ print(f" svg-1: {len(kept)} records → {dest_json.relative_to(DEST_ROOT)}")
+
+
+# ----------------------------------------------------------------------------
+# template-1 (JSON + lica-data/layouts + lica-data/images)
+# ----------------------------------------------------------------------------
+
+
+def _layout_template_id(layout_index: Dict[str, str], layout_id: str) -> str:
+ return layout_index.get(layout_id, "")
+
+
+def _copy_layout_assets(
+ layout_id: str,
+ template_id: str,
+ src_data_root: Path,
+ dest_data_root: Path,
+) -> None:
+ """Copy the per-layout files the benchmark may consume, if they exist."""
+ if not template_id:
+ return
+ rel_layout = Path("layouts") / template_id / f"{layout_id}.json"
+ rel_image = Path("images") / template_id / f"{layout_id}.png"
+ rel_annot = Path("annotations") / template_id / f"{layout_id}.json"
+
+ src_layout = src_data_root / rel_layout
+ if src_layout.is_file():
+ _copy_text(src_layout, dest_data_root / rel_layout)
+ src_image = src_data_root / rel_image
+ if src_image.is_file():
+ _downsample_image(src_image, dest_data_root / rel_image)
+ src_annot = src_data_root / rel_annot
+ if src_annot.is_file():
+ _copy_text(src_annot, dest_data_root / rel_annot)
+
+
+def _build_template1() -> None:
+ src_json = SOURCE_ROOT / "benchmarks/template/template-1.json"
+ dest_json = DEST_ROOT / "benchmarks/template/template-1.json"
+ full = json.loads(src_json.read_text(encoding="utf-8"))
+
+ src_data_root = SOURCE_ROOT / full.get("data_root", "lica-data")
+ dest_data_root = DEST_ROOT / full.get("data_root", "lica-data")
+
+ pairs = full.get("pairs", [])
+ # Take one label=1 and one label=0 pair if we can — exercises both paths.
+ chosen: List[dict] = []
+ seen_labels: set = set()
+ for p in pairs:
+ if p.get("label") in seen_labels:
+ continue
+ chosen.append(p)
+ seen_labels.add(p["label"])
+ if len(chosen) >= N_PER_TASK:
+ break
+ if not chosen:
+ chosen = pairs[:N_PER_TASK]
+
+ used_layout_ids: List[str] = []
+ for p in chosen:
+ used_layout_ids.append(p["layout_a"])
+ used_layout_ids.append(p["layout_b"])
+
+ mini_index: Dict[str, str] = {}
+ for lid in used_layout_ids:
+ tid = _layout_template_id(full.get("layout_index", {}), lid)
+ if tid:
+ mini_index[lid] = tid
+ _copy_layout_assets(lid, tid, src_data_root, dest_data_root)
+
+ out = {
+ "data_root": full.get("data_root", "lica-data"),
+ "layout_index": mini_index,
+ "pairs": chosen,
+ }
+ dest_json.parent.mkdir(parents=True, exist_ok=True)
+ dest_json.write_text(json.dumps(out, indent=2), encoding="utf-8")
+ print(
+ f" template-1: {len(chosen)} pairs (labels={sorted(seen_labels)}) "
+ f"→ {dest_json.relative_to(DEST_ROOT)}"
+ )
+
+
+# ----------------------------------------------------------------------------
+# Top-level driver
+# ----------------------------------------------------------------------------
+
+
+def _write_readme() -> None:
+ readme = DEST_ROOT / "README.md"
+ readme.parent.mkdir(parents=True, exist_ok=True)
+ readme.write_text(
+ "# gdb verify fixture\n\n"
+ "Tiny bundled dataset used by `gdb verify` to confirm an install is\n"
+ "functional without any downloads or API keys. Covers the\n"
+ "`v0-smoke` suite only.\n\n"
+ "**Do not edit by hand.** Regenerate with:\n\n"
+ "```bash\n"
+ "python scripts/build_verify_dataset.py\n"
+ "```\n\n"
+ f"Images are downsampled to {MAX_PX}px on the long edge; scores\n"
+ "produced against this fixture are **meaningless** by design.\n",
+ encoding="utf-8",
+ )
+
+
+def _clean() -> None:
+ if DEST_ROOT.exists():
+ shutil.rmtree(DEST_ROOT)
+
+
+def _tree_size(root: Path) -> int:
+ total = 0
+ for p in root.rglob("*"):
+ if p.is_file():
+ total += p.stat().st_size
+ return total
+
+
+def _iter_files(root: Path) -> Iterable[Path]:
+ for p in sorted(root.rglob("*")):
+ if p.is_file():
+ yield p
+
+
+def main() -> None:
+ global SOURCE_ROOT, DEST_ROOT
+
+ parser = argparse.ArgumentParser(description=__doc__.splitlines()[0])
+ parser.add_argument("--source-root", default=str(SOURCE_ROOT))
+ parser.add_argument("--dest-root", default=str(DEST_ROOT))
+ parser.add_argument("--keep-existing", action="store_true",
+ help="Don't wipe dest-root first (default: wipe).")
+ args = parser.parse_args()
+
+ SOURCE_ROOT = Path(args.source_root)
+ DEST_ROOT = Path(args.dest_root)
+
+ if not SOURCE_ROOT.is_dir():
+ raise SystemExit(f"Source not found: {SOURCE_ROOT}")
+
+ if not args.keep_existing:
+ _clean()
+
+ DEST_ROOT.mkdir(parents=True, exist_ok=True)
+ print(f"Building verify fixture at {DEST_ROOT} (source: {SOURCE_ROOT})")
+
+ print("\n[csv tasks]")
+ for task_id, src_sub, dest_sub in _CSV_TASKS:
+ _build_csv_task(task_id, src_sub, dest_sub)
+
+ print("\n[svg-1]")
+ _build_svg1()
+
+ print("\n[template-1]")
+ _build_template1()
+
+ _write_readme()
+
+ total = _tree_size(DEST_ROOT)
+ n_files = sum(1 for _ in _iter_files(DEST_ROOT))
+ print(f"\nDone. {n_files} files, {total / 1024:.1f} KiB at {DEST_ROOT}")
+
+
+if __name__ == "__main__":
+ main()
diff --git a/scripts/run_benchmarks.py b/scripts/run_benchmarks.py
index 5d29a0c..ca999b6 100644
--- a/scripts/run_benchmarks.py
+++ b/scripts/run_benchmarks.py
@@ -1,31 +1,22 @@
#!/usr/bin/env python3
-"""CLI to list and run GDB benchmarks (stub model, API providers, batch APIs).
-
-Usage:
- # Stub smoke test (no API keys)
- python scripts/run_benchmarks.py --stub-model --benchmarks layout-4 layout-5 \\
- --dataset-root data/gdb-dataset --n 5
-
- # API run (shipped Lica layout)
- python scripts/run_benchmarks.py --benchmarks svg-1 \\
- --provider gemini --credentials auth/google-cloud-key.json \\
- --dataset-root data/gdb-dataset
-
- python scripts/run_benchmarks.py --benchmarks layout-1 \\
- --provider openai_image --model-id gpt-image-1.5 \\
- --data /path/to/custom/layout2_folder --dataset-root data/gdb-dataset \\
- --n 200 -o outputs/baseline.json
-
- # Batch submit (~50% cheaper, fire-and-forget)
- python scripts/run_benchmarks.py --batch-submit --benchmarks svg-1 \\
- --provider gemini --credentials /path/to/credentials.json \\
- --dataset-root data/gdb-dataset
-
- # Collect results from a previous submit
- python scripts/run_benchmarks.py --collect jobs/job_manifest.json
-
- # List all benchmarks
- python scripts/run_benchmarks.py --list
+"""Legacy entry point for GDB benchmarks. Prefer the ``gdb`` command.
+
+.. deprecated:: 0.2.0
+ Use the installed ``gdb`` console script (``pip install lica-gdb``) or
+ ``python -m gdb``. Equivalents:
+
+ * ``python scripts/run_benchmarks.py --list``
+ → ``gdb list``
+ * ``python scripts/run_benchmarks.py --stub-model --benchmarks layout-4``
+ → ``gdb eval --stub-model --benchmarks layout-4`` (or ``gdb verify``)
+ * ``python scripts/run_benchmarks.py --provider openai --benchmarks svg-1``
+ → ``gdb eval --provider openai --benchmarks svg-1``
+ * ``python scripts/run_benchmarks.py --batch-submit --provider gemini --benchmarks svg-1``
+ → ``gdb submit --provider gemini --benchmarks svg-1``
+ * ``python scripts/run_benchmarks.py --collect jobs/job_manifest.json``
+ → ``gdb collect jobs/job_manifest.json``
+
+ This script will be removed in a future release.
"""
from __future__ import annotations
@@ -615,6 +606,11 @@ def cmd_collect(args: argparse.Namespace) -> bool:
def main() -> None:
+ print(
+ "[deprecated] scripts/run_benchmarks.py will be removed in a future release. "
+ "Use `gdb` (see `gdb --help`) or `python -m gdb` instead.",
+ file=sys.stderr,
+ )
parser = argparse.ArgumentParser(
description="Unified benchmark runner",
formatter_class=argparse.RawDescriptionHelpFormatter,
diff --git a/src/gdb/__init__.py b/src/gdb/__init__.py
index a4fce21..70bb82d 100644
--- a/src/gdb/__init__.py
+++ b/src/gdb/__init__.py
@@ -10,9 +10,9 @@
from .runner import BenchmarkRunner
try:
- __version__ = version("gdb")
+ __version__ = version("lica-gdb")
except PackageNotFoundError: # e.g. running from a checkout without install
- __version__ = "0.1.0"
+ __version__ = "0+unknown"
__all__ = [
"__version__",
diff --git a/src/gdb/__main__.py b/src/gdb/__main__.py
index 34b9626..0cdb21f 100644
--- a/src/gdb/__main__.py
+++ b/src/gdb/__main__.py
@@ -1,106 +1,6 @@
-"""CLI entry point: ``python -m gdb``."""
-
-import argparse
-import sys
-
-from .base import TaskType
-from .registry import BenchmarkRegistry
-from .runner import BenchmarkRunner
-
-
-def _build_registry() -> BenchmarkRegistry:
- registry = BenchmarkRegistry()
- registry.discover()
- return registry
-
-
-def cmd_list(args: argparse.Namespace) -> None:
- registry = _build_registry()
-
- task_type = None
- if args.task_type:
- task_type = TaskType(args.task_type)
-
- benchmarks = registry.list(
- domain=args.domain,
- task_type=task_type,
- )
-
- if not benchmarks:
- print("No benchmarks matched the given filters.")
- return
-
- print(f"{'ID':<20} {'Type':<15} {'Domain':<16} {'Name'}")
- print("-" * 80)
- for b in benchmarks:
- print(
- f"{b.meta.id:<20} {b.meta.task_type.value:<15} "
- f"{b.meta.domain:<16} {b.meta.name}"
- )
- print(f"\n{len(benchmarks)} benchmark(s) found.")
-
-
-def cmd_info(args: argparse.Namespace) -> None:
- registry = _build_registry()
- try:
- b = registry.get(args.benchmark_id)
- except KeyError as exc:
- print(str(exc), file=sys.stderr)
- sys.exit(1)
-
- m = b.meta
- print(f"ID: {m.id}")
- print(f"Name: {m.name}")
- print(f"Task type: {m.task_type.value}")
- print(f"Domain: {m.domain}")
- print(f"Description: {m.description}")
- if m.input_spec:
- print(f"Input: {m.input_spec}")
- if m.metrics:
- print(f"Metrics: {', '.join(m.metrics)}")
- if m.tags:
- print(f"Tags: {', '.join(m.tags)}")
-
-
-def cmd_run(args: argparse.Namespace) -> None:
- registry = _build_registry()
- runner = BenchmarkRunner(registry)
- report = runner.run_from_csv(args.csv_path)
- print(report.summary())
-
- if args.output:
- report.save(args.output)
- print(f"\nResults saved to {args.output}")
-
-
-def main() -> None:
- parser = argparse.ArgumentParser(
- prog="python -m gdb",
- description="GDB: GraphicDesignBench",
- )
- sub = parser.add_subparsers(dest="command")
-
- p_list = sub.add_parser("list", help="List registered benchmarks")
- p_list.add_argument("--domain", help="Filter by domain (e.g. svg, temporal)")
- p_list.add_argument(
- "--task-type",
- choices=["understanding", "generation"],
- help="Filter by task type",
- )
- p_info = sub.add_parser("info", help="Show details for a benchmark")
- p_info.add_argument("benchmark_id", help="Benchmark ID (e.g. svg-1, layout-1)")
-
- p_run = sub.add_parser("run", help="Run benchmarks from a CSV file")
- p_run.add_argument("csv_path", help="Path to CSV with model outputs")
- p_run.add_argument("--output", "-o", help="Save results to JSON file")
-
- args = parser.parse_args()
- if args.command is None:
- parser.print_help()
- sys.exit(0)
-
- {"list": cmd_list, "info": cmd_info, "run": cmd_run}[args.command](args)
+"""Entry point for ``python -m gdb``; delegates to :mod:`gdb.cli`."""
+from .cli import main
if __name__ == "__main__":
main()
diff --git a/src/gdb/_verify_data/README.md b/src/gdb/_verify_data/README.md
new file mode 100644
index 0000000..75b8fe4
--- /dev/null
+++ b/src/gdb/_verify_data/README.md
@@ -0,0 +1,14 @@
+# gdb verify fixture
+
+Tiny bundled dataset used by `gdb verify` to confirm an install is
+functional without any downloads or API keys. Covers the
+`v0-smoke` suite only.
+
+**Do not edit by hand.** Regenerate with:
+
+```bash
+python scripts/build_verify_dataset.py
+```
+
+Images are downsampled to 128px on the long edge; scores
+produced against this fixture are **meaningless** by design.
diff --git a/src/gdb/_verify_data/benchmarks/category/CategoryClassification/samples.csv b/src/gdb/_verify_data/benchmarks/category/CategoryClassification/samples.csv
new file mode 100644
index 0000000..4ded0c9
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/category/CategoryClassification/samples.csv
@@ -0,0 +1,3 @@
+sample_id,image_path,prompt,expected_output
+1KoSdCqVsdnauEhMLoDV,lica-data/images/000a2533-b083-4d03-9602-715b8bf4409d/1KoSdCqVsdnauEhMLoDV.png,"You are a design template classifier. Look at this rendered design template image and classify it into a single broad category describing its type or purpose (e.g. the overall template format, not the specific topic or theme). Give your top 5 guesses, one per line, most likely first. Respond with ONLY the broad category names in lowercase, no numbering, no explanation, no extra text.",instagram posts
+DUshGixLZ65JSC09f66p,lica-data/images/000f0ff6-c99c-4376-9a84-a57b99d4a6ba/DUshGixLZ65JSC09f66p.png,"You are a design template classifier. Look at this rendered design template image and classify it into a single broad category describing its type or purpose (e.g. the overall template format, not the specific topic or theme). Give your top 5 guesses, one per line, most likely first. Respond with ONLY the broad category names in lowercase, no numbering, no explanation, no extra text.",instagram posts
diff --git a/src/gdb/_verify_data/benchmarks/layout/AspectRatioClassification/samples.csv b/src/gdb/_verify_data/benchmarks/layout/AspectRatioClassification/samples.csv
new file mode 100644
index 0000000..724f323
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/layout/AspectRatioClassification/samples.csv
@@ -0,0 +1,11 @@
+sample_id,image_path,prompt,expected_output
+gT6aha6BT385mIz8ddgx,lica-data/images/00863b23-aa9e-4572-b124-01749574e893/gT6aha6BT385mIz8ddgx.png,"You are a design layout analyst. Look at this rendered design template and predict the aspect ratio of the canvas.
+
+Choose exactly one from: 1:1, 16:9, 9:16, 4:3, 3:4, 4:5, 5:4, 2:3, 3:2, 21:9
+
+Respond with ONLY the aspect ratio (e.g. ""16:9""). Do not include any explanation, punctuation, or extra text.",5:4
+ggo0WRiaZyivLfqsl7j6,lica-data/images/7b5d33fc-979d-4b48-bb70-d438e75c2dd0/ggo0WRiaZyivLfqsl7j6.png,"You are a design layout analyst. Look at this rendered design template and predict the aspect ratio of the canvas.
+
+Choose exactly one from: 1:1, 16:9, 9:16, 4:3, 3:4, 4:5, 5:4, 2:3, 3:2, 21:9
+
+Respond with ONLY the aspect ratio (e.g. ""16:9""). Do not include any explanation, punctuation, or extra text.",4:3
diff --git a/src/gdb/_verify_data/benchmarks/layout/ComponentCount/samples.csv b/src/gdb/_verify_data/benchmarks/layout/ComponentCount/samples.csv
new file mode 100644
index 0000000..fb8e91c
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/layout/ComponentCount/samples.csv
@@ -0,0 +1,11 @@
+sample_id,image_path,prompt,expected_output
+gT6aha6BT385mIz8ddgx,lica-data/images/00863b23-aa9e-4572-b124-01749574e893/gT6aha6BT385mIz8ddgx.png,"You are a design layout analyst. Look at this rendered design template and count the total number of distinct visual elements you can see (text blocks, images, decorative shapes, icons, frames, etc.).
+
+Do NOT count the background canvas itself.
+
+Respond with ONLY a single integer. Do not include any explanation or extra text.",22
+ggo0WRiaZyivLfqsl7j6,lica-data/images/7b5d33fc-979d-4b48-bb70-d438e75c2dd0/ggo0WRiaZyivLfqsl7j6.png,"You are a design layout analyst. Look at this rendered design template and count the total number of distinct visual elements you can see (text blocks, images, decorative shapes, icons, frames, etc.).
+
+Do NOT count the background canvas itself.
+
+Respond with ONLY a single integer. Do not include any explanation or extra text.",42
diff --git a/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_0.png b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_0.png
new file mode 100644
index 0000000..cdd6b75
Binary files /dev/null and b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_0.png differ
diff --git a/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_0.svg b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_0.svg
new file mode 100644
index 0000000..e609b23
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_0.svg
@@ -0,0 +1,3 @@
+
diff --git a/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_5.png b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_5.png
new file mode 100644
index 0000000..2c7deb6
Binary files /dev/null and b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_5.png differ
diff --git a/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_5.svg b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_5.svg
new file mode 100644
index 0000000..1eb8495
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/svg/assets/09udskj3Rw8KDeDaF27g_5.svg
@@ -0,0 +1,3 @@
+
diff --git a/src/gdb/_verify_data/benchmarks/svg/svg-1.json b/src/gdb/_verify_data/benchmarks/svg/svg-1.json
new file mode 100644
index 0000000..c01a672
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/svg/svg-1.json
@@ -0,0 +1,38 @@
+[
+ {
+ "image_path": "assets/09udskj3Rw8KDeDaF27g_0.png",
+ "svg_path": "assets/09udskj3Rw8KDeDaF27g_0.svg",
+ "questions": {
+ "perceptual_qa": {
+ "question": "What is the overall aspect ratio of this graphic?",
+ "option": {
+ "A": "Wider than tall",
+ "B": "Taller than wide",
+ "C": "Perfectly square",
+ "D": "Circular"
+ },
+ "answer": "A"
+ }
+ },
+ "complexity": "easy",
+ "complexity_score": 7.24
+ },
+ {
+ "image_path": "assets/09udskj3Rw8KDeDaF27g_5.png",
+ "svg_path": "assets/09udskj3Rw8KDeDaF27g_5.svg",
+ "questions": {
+ "perceptual_qa": {
+ "question": "What is the dominant color of this circle?",
+ "option": {
+ "A": "Orange",
+ "B": "Blue",
+ "C": "Green",
+ "D": "Purple"
+ },
+ "answer": "A"
+ }
+ },
+ "complexity": "easy",
+ "complexity_score": 7.36
+ }
+]
\ No newline at end of file
diff --git a/src/gdb/_verify_data/benchmarks/template/template-1.json b/src/gdb/_verify_data/benchmarks/template/template-1.json
new file mode 100644
index 0000000..5b66440
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/template/template-1.json
@@ -0,0 +1,21 @@
+{
+ "data_root": "lica-data",
+ "layout_index": {
+ "58Uk9hjmhLtCzWgSrIbB": "b88fde26-58c2-4d55-8b81-66601086a0d0",
+ "iZCigr3Lv9tAwHXDkxch": "b88fde26-58c2-4d55-8b81-66601086a0d0",
+ "qgwGx1HCTDqWSoR1Iksr": "01051363-955b-464e-a725-c433e4e7ee97",
+ "B0zh8N0hAZYAFe3l2vmH": "08060e52-d186-487d-b9e9-0be9d6599a0b"
+ },
+ "pairs": [
+ {
+ "layout_a": "58Uk9hjmhLtCzWgSrIbB",
+ "layout_b": "iZCigr3Lv9tAwHXDkxch",
+ "label": 1
+ },
+ {
+ "layout_a": "qgwGx1HCTDqWSoR1Iksr",
+ "layout_b": "B0zh8N0hAZYAFe3l2vmH",
+ "label": 0
+ }
+ ]
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/benchmarks/typography/FontFamilyClassification/samples.csv b/src/gdb/_verify_data/benchmarks/typography/FontFamilyClassification/samples.csv
new file mode 100644
index 0000000..5aae7b0
--- /dev/null
+++ b/src/gdb/_verify_data/benchmarks/typography/FontFamilyClassification/samples.csv
@@ -0,0 +1,11 @@
+sample_id,image_path,prompt,expected_output
+gT6aha6BT385mIz8ddgx_text11,lica-data/images/00863b23-aa9e-4572-b124-01749574e893/gT6aha6BT385mIz8ddgx.png,"You are a typography expert. Look at this rendered design template.
+
+What font family is used for the text: ""loremipsumsite.co""?
+
+Respond with ONLY the font family name (e.g. ""Roboto"", ""Open Sans"", ""DM Serif Display""). Do not include weight, style, or any explanation.",Montserrat
+gT6aha6BT385mIz8ddgx_text8,lica-data/images/00863b23-aa9e-4572-b124-01749574e893/gT6aha6BT385mIz8ddgx.png,"You are a typography expert. Look at this rendered design template.
+
+What font family is used for the text: ""BOOK""?
+
+Respond with ONLY the font family name (e.g. ""Roboto"", ""Open Sans"", ""DM Serif Display""). Do not include weight, style, or any explanation.",Montserrat
diff --git a/src/gdb/_verify_data/lica-data/annotations/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.json b/src/gdb/_verify_data/lica-data/annotations/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.json
new file mode 100644
index 0000000..b058e2e
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/annotations/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.json
@@ -0,0 +1,7 @@
+{
+ "description": "This image is a festive illustration designed as a Ramadan greeting card or social media post. At the top, prominent dark brown Arabic calligraphy spells out \"Ramadan Kareem,\" with the English transliteration \"RAMADAN KAREEM\" written in spaced-out, uppercase sans-serif letters directly below it. Below the text, a dark purple, ribbon-like banner spans horizontally across the middle of the composition. Flanking this banner and positioned towards the bottom are two stylized, smiling figures: a woman in a yellow hijab on the left, and a man in an orange kufi on the right, both depicted with their hands pressed together in a respectful greeting gesture. Between the two figures and beneath the banner, a central mosque illustration with a prominent red dome and two yellow minarets serves as the main architectural element. Delicate dotted lines extend downwards from the banner, adorned with hanging crescent moons and stars in shades of red, orange, and yellow. The entire composition rests on a soft, light peach-colored background, with a few scattered dark dots adding minimal visual texture.",
+ "aesthetics": "The design employs a warm, inviting, and cheerful aesthetic, characterized by a flat illustration style with clean lines and a friendly, approachable feel. The color palette is dominated by warm tones\u2014reds, oranges, yellows, and a contrasting dark purple/brown for the text and banner\u2014all set against a soft peach background, creating a harmonious and celebratory mood. Typography features elegant Arabic calligraphy for cultural authenticity, complemented by a simple, legible sans-serif font for the English text, ensuring clarity. The composition is well-balanced and largely symmetrical, with the central mosque and banner acting as a focal point, framed by the two welcoming figures. Elements are thoughtfully spaced, providing ample breathing room and preventing a cluttered appearance. The overall visual integration is strong, guiding the viewer's eye from the greeting at the top to the festive scene below, conveying a sense of community and celebration for the holy month.",
+ "tags": "Ramadan Kareem, Islamic greeting, flat illustration, Muslim couple, mosque, crescent moon, stars, Arabic calligraphy, festive, holiday, cultural, warm colors, peach background, greeting card, celebration, social media graphic, cheerful, religious festival.",
+ "user_intent": "Create a warm and inviting social media graphic or greeting card to extend wishes for Ramadan Kareem, featuring traditional Islamic elements and a friendly, celebratory atmosphere.",
+ "raw": "Description:\nThis image is a festive illustration designed as a Ramadan greeting card or social media post. At the top, prominent dark brown Arabic calligraphy spells out \"Ramadan Kareem,\" with the English transliteration \"RAMADAN KAREEM\" written in spaced-out, uppercase sans-serif letters directly below it. Below the text, a dark purple, ribbon-like banner spans horizontally across the middle of the composition. Flanking this banner and positioned towards the bottom are two stylized, smiling figures: a woman in a yellow hijab on the left, and a man in an orange kufi on the right, both depicted with their hands pressed together in a respectful greeting gesture. Between the two figures and beneath the banner, a central mosque illustration with a prominent red dome and two yellow minarets serves as the main architectural element. Delicate dotted lines extend downwards from the banner, adorned with hanging crescent moons and stars in shades of red, orange, and yellow. The entire composition rests on a soft, light peach-colored background, with a few scattered dark dots adding minimal visual texture.\n\nAesthetics:\nThe design employs a warm, inviting, and cheerful aesthetic, characterized by a flat illustration style with clean lines and a friendly, approachable feel. The color palette is dominated by warm tones\u2014reds, oranges, yellows, and a contrasting dark purple/brown for the text and banner\u2014all set against a soft peach background, creating a harmonious and celebratory mood. Typography features elegant Arabic calligraphy for cultural authenticity, complemented by a simple, legible sans-serif font for the English text, ensuring clarity. The composition is well-balanced and largely symmetrical, with the central mosque and banner acting as a focal point, framed by the two welcoming figures. Elements are thoughtfully spaced, providing ample breathing room and preventing a cluttered appearance. The overall visual integration is strong, guiding the viewer's eye from the greeting at the top to the festive scene below, conveying a sense of community and celebration for the holy month.\n\nTags:\nRamadan Kareem, Islamic greeting, flat illustration, Muslim couple, mosque, crescent moon, stars, Arabic calligraphy, festive, holiday, cultural, warm colors, peach background, greeting card, celebration, social media graphic, cheerful, religious festival.\n\nUser Intent:\nCreate a warm and inviting social media graphic or greeting card to extend wishes for Ramadan Kareem, featuring traditional Islamic elements and a friendly, celebratory atmosphere."
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/lica-data/annotations/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.json b/src/gdb/_verify_data/lica-data/annotations/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.json
new file mode 100644
index 0000000..b531424
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/annotations/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.json
@@ -0,0 +1,7 @@
+{
+ "description": "The image features a vertically oriented digital graphic with a muted sage green background, adorned with subtle, lighter-toned leaf shadow patterns, giving it an organic feel. Centrally positioned in the upper half is a large, off-white rounded rectangular card with a thin, dark green border. Overlapping the top edge of this card is a circular profile picture of a man, seemingly a gardening expert, wearing glasses and a blue hoodie, tending to plants in a greenhouse setting. Below the profile picture, within the card, the word \"ASK\" is prominently displayed in a dark green, sans-serif, uppercase font. Directly beneath it, \"GARDENING QUESTIONS\" is written in a slightly lighter, elegant serif font, also in dark green. At the bottom of the card, there's a rounded rectangular input field placeholder in a lighter olive-beige color, containing the text \"ask question below...\" in a muted gray. The lower section of the overall image features three stylized, dark green monstera-like leaves with visible veins, appearing to grow from the bottom left, adding a botanical accent.",
+ "aesthetics": "The design style is modern, clean, and organic, exuding a calm and inviting educational aesthetic. The typography employs a clear hierarchy, combining a strong sans-serif for the call to action \"ASK\" and an elegant, decorative serif for the theme \"GARDENING QUESTIONS,\" both center-aligned for balance. Spacing is generous throughout, providing ample breathing room around the central card and its internal elements, which contributes to an uncluttered and sophisticated look. The composition is largely symmetrical with the central card and its embedded elements acting as the primary focal point, further emphasized by the circular profile image. The stylized leaves at the bottom offer an organic, asymmetrical counterpoint, guiding the eye through the design. The color harmony is excellent, utilizing an earthy palette of various greens, off-whites, and muted grays, creating a natural, soothing, and sophisticated feel that strongly reinforces the gardening theme. The subtle background texture and clean illustrative elements enhance visual integration.",
+ "tags": "Gardening, plant care, ask me anything, Q&A, green, nature, organic, elegant, modern, minimal, earthy tones, botanical, social media post, inquiry, user interface.",
+ "user_intent": "Create an inviting and user-friendly social media post or a section for a website/app aimed at encouraging users to submit gardening-related questions, showcasing an expert, and featuring a clean, nature-inspired design.",
+ "raw": "Description:\nThe image features a vertically oriented digital graphic with a muted sage green background, adorned with subtle, lighter-toned leaf shadow patterns, giving it an organic feel. Centrally positioned in the upper half is a large, off-white rounded rectangular card with a thin, dark green border. Overlapping the top edge of this card is a circular profile picture of a man, seemingly a gardening expert, wearing glasses and a blue hoodie, tending to plants in a greenhouse setting. Below the profile picture, within the card, the word \"ASK\" is prominently displayed in a dark green, sans-serif, uppercase font. Directly beneath it, \"GARDENING QUESTIONS\" is written in a slightly lighter, elegant serif font, also in dark green. At the bottom of the card, there's a rounded rectangular input field placeholder in a lighter olive-beige color, containing the text \"ask question below...\" in a muted gray. The lower section of the overall image features three stylized, dark green monstera-like leaves with visible veins, appearing to grow from the bottom left, adding a botanical accent.\n\nAesthetics:\nThe design style is modern, clean, and organic, exuding a calm and inviting educational aesthetic. The typography employs a clear hierarchy, combining a strong sans-serif for the call to action \"ASK\" and an elegant, decorative serif for the theme \"GARDENING QUESTIONS,\" both center-aligned for balance. Spacing is generous throughout, providing ample breathing room around the central card and its internal elements, which contributes to an uncluttered and sophisticated look. The composition is largely symmetrical with the central card and its embedded elements acting as the primary focal point, further emphasized by the circular profile image. The stylized leaves at the bottom offer an organic, asymmetrical counterpoint, guiding the eye through the design. The color harmony is excellent, utilizing an earthy palette of various greens, off-whites, and muted grays, creating a natural, soothing, and sophisticated feel that strongly reinforces the gardening theme. The subtle background texture and clean illustrative elements enhance visual integration.\n\nTags:\nGardening, plant care, ask me anything, Q&A, green, nature, organic, elegant, modern, minimal, earthy tones, botanical, social media post, inquiry, user interface.\n\nUser Intent:\nCreate an inviting and user-friendly social media post or a section for a website/app aimed at encouraging users to submit gardening-related questions, showcasing an expert, and featuring a clean, nature-inspired design."
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/lica-data/annotations/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.json b/src/gdb/_verify_data/lica-data/annotations/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.json
new file mode 100644
index 0000000..bf14bbe
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/annotations/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.json
@@ -0,0 +1,7 @@
+{
+ "description": "The image displays a contact information slide or section with a cheerful and food-themed design. The background is a solid, light yellow-cream color. On the left side, the bold, large word \"CONTACT\" is prominently displayed in an orange-brown color with a green outline/shadow. Below this, contact details are neatly presented in two columns: \"Email,\" \"Web,\" \"Tel,\" and \"Addr\" as labels, followed by their respective lorem ipsum placeholders. Several vibrant yellow-orange flowers, resembling marigolds, are scattered around the left side, appearing at the top-left, top-center, bottom-left, and partially at the bottom-right. The right side of the layout features illustrative depictions of various food dishes: a purple plate with golden rectangular food items (possibly paneer or tofu) and a small bowl of red sauce is at the top-right. Below this, a light blue plate holds several folded flatbreads and a small bowl of white dip with green specks. Partially visible at the very bottom-right is a yellow plate with a piece of dark meat, white rice, sliced tomatoes, and a green garnish.",
+ "aesthetics": "The design style is playful, inviting, and illustrative, strongly suggesting a connection to food and possibly a cultural theme (e.g., Indian or South Asian, given the marigolds and food types). The typography for \"CONTACT\" is bold and impactful, using a sans-serif font with a distinct outlined effect that creates visual depth. The contact details below use a clean, legible sans-serif font, with good hierarchy between the labels and the actual information. Ample spacing is used throughout, particularly on the left side, providing excellent readability and a clean look for the text. The composition is an asymmetrical balance, with the strong textual element and scattered flowers on the left counteracting the cluster of food illustrations on the right. The color palette is warm and harmonious, dominated by the light yellow background, complemented by vibrant oranges, greens, purples, and earthy browns from the illustrations and text. The colors are bright but not overly saturated, contributing to a cheerful and appealing overall aesthetic.",
+ "tags": "Contact page, food illustration, presentation slide, warm color palette, playful design, Indian cuisine, South Asian food, marigold flowers, sans-serif font, asymmetrical layout, cheerful, illustrative, email contact, phone number, website, address.",
+ "user_intent": "Create an inviting and visually appealing contact information slide or section for a food-related business or event, featuring stylized food illustrations and a clear, readable display of contact details.",
+ "raw": "Description:\nThe image displays a contact information slide or section with a cheerful and food-themed design. The background is a solid, light yellow-cream color. On the left side, the bold, large word \"CONTACT\" is prominently displayed in an orange-brown color with a green outline/shadow. Below this, contact details are neatly presented in two columns: \"Email,\" \"Web,\" \"Tel,\" and \"Addr\" as labels, followed by their respective lorem ipsum placeholders. Several vibrant yellow-orange flowers, resembling marigolds, are scattered around the left side, appearing at the top-left, top-center, bottom-left, and partially at the bottom-right. The right side of the layout features illustrative depictions of various food dishes: a purple plate with golden rectangular food items (possibly paneer or tofu) and a small bowl of red sauce is at the top-right. Below this, a light blue plate holds several folded flatbreads and a small bowl of white dip with green specks. Partially visible at the very bottom-right is a yellow plate with a piece of dark meat, white rice, sliced tomatoes, and a green garnish.\n\nAesthetics:\nThe design style is playful, inviting, and illustrative, strongly suggesting a connection to food and possibly a cultural theme (e.g., Indian or South Asian, given the marigolds and food types). The typography for \"CONTACT\" is bold and impactful, using a sans-serif font with a distinct outlined effect that creates visual depth. The contact details below use a clean, legible sans-serif font, with good hierarchy between the labels and the actual information. Ample spacing is used throughout, particularly on the left side, providing excellent readability and a clean look for the text. The composition is an asymmetrical balance, with the strong textual element and scattered flowers on the left counteracting the cluster of food illustrations on the right. The color palette is warm and harmonious, dominated by the light yellow background, complemented by vibrant oranges, greens, purples, and earthy browns from the illustrations and text. The colors are bright but not overly saturated, contributing to a cheerful and appealing overall aesthetic.\n\nTags:\nContact page, food illustration, presentation slide, warm color palette, playful design, Indian cuisine, South Asian food, marigold flowers, sans-serif font, asymmetrical layout, cheerful, illustrative, email contact, phone number, website, address.\n\nUser Intent:\nCreate an inviting and visually appealing contact information slide or section for a food-related business or event, featuring stylized food illustrations and a clear, readable display of contact details."
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/lica-data/annotations/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.json b/src/gdb/_verify_data/lica-data/annotations/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.json
new file mode 100644
index 0000000..55c2473
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/annotations/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.json
@@ -0,0 +1,7 @@
+{
+ "description": "This image is a promotional graphic, likely for a \"Best Chef\" competition or event, set against a light yellow or cream-colored background. The main title, \"BEST CHEF,\" is prominently displayed in the upper center, rendered in a bold, uppercase sans-serif font with an orange fill and a subtle light green/yellow outline. Below the title, four circular portrait photographs of individuals are arranged horizontally, evenly spaced. Each portrait features a smiling person, presumably a chef or contestant, and is accompanied by their name (TOM, PAT, SASHA, DEW) centered beneath it in an orange sans-serif font. The composition is framed by stylized food illustrations and floral elements: in the top left, a large banana leaf platter holds various traditional dishes; in the top right, a wooden bowl contains a curry with fish pieces; in the bottom left, a dark green bowl features a curry with tofu-like pieces; and scattered across the background are three yellow marigold-like flowers of varying sizes (one large near the top left food, one medium in the bottom right, and one small closer to the bottom right corner).",
+ "aesthetics": "The design conveys a warm, friendly, and inviting aesthetic, combining photographic elements with whimsical illustrations. The color palette is dominated by warm yellows and oranges, creating an energetic and cheerful mood. The typography for \"BEST CHEF\" is bold and impactful, while the names below the portraits use a simpler, clean sans-serif font for readability, establishing a clear hierarchy. The composition is balanced, with the central title and horizontally aligned portraits serving as the main focal points. The asymmetrical placement of the food illustrations and flowers adds a playful, organic touch, preventing the layout from feeling too rigid while still guiding the eye around the content. Spacing between elements is generous, providing ample breathing room and a clean overall appearance. The integration of realistic portraits with flat, illustrative food art works harmoniously due to the consistent warm color scheme and overall cheerful tone, suggesting a casual yet professional event.",
+ "tags": "Culinary event, chef profiles, cooking competition, promotional graphic, food illustration, portrait photography, warm color palette, yellow background, orange text, sans-serif font, friendly design, inviting layout, social media content, event announcement, chefs, diverse food, digital art, graphic design.",
+ "user_intent": "Create a vibrant and engaging promotional graphic to introduce the participants or judges of a culinary competition or event, showcasing their portraits and names alongside decorative food-related illustrations and a clear event title.",
+ "raw": "Description:\nThis image is a promotional graphic, likely for a \"Best Chef\" competition or event, set against a light yellow or cream-colored background. The main title, \"BEST CHEF,\" is prominently displayed in the upper center, rendered in a bold, uppercase sans-serif font with an orange fill and a subtle light green/yellow outline. Below the title, four circular portrait photographs of individuals are arranged horizontally, evenly spaced. Each portrait features a smiling person, presumably a chef or contestant, and is accompanied by their name (TOM, PAT, SASHA, DEW) centered beneath it in an orange sans-serif font. The composition is framed by stylized food illustrations and floral elements: in the top left, a large banana leaf platter holds various traditional dishes; in the top right, a wooden bowl contains a curry with fish pieces; in the bottom left, a dark green bowl features a curry with tofu-like pieces; and scattered across the background are three yellow marigold-like flowers of varying sizes (one large near the top left food, one medium in the bottom right, and one small closer to the bottom right corner).\n\nAesthetics:\nThe design conveys a warm, friendly, and inviting aesthetic, combining photographic elements with whimsical illustrations. The color palette is dominated by warm yellows and oranges, creating an energetic and cheerful mood. The typography for \"BEST CHEF\" is bold and impactful, while the names below the portraits use a simpler, clean sans-serif font for readability, establishing a clear hierarchy. The composition is balanced, with the central title and horizontally aligned portraits serving as the main focal points. The asymmetrical placement of the food illustrations and flowers adds a playful, organic touch, preventing the layout from feeling too rigid while still guiding the eye around the content. Spacing between elements is generous, providing ample breathing room and a clean overall appearance. The integration of realistic portraits with flat, illustrative food art works harmoniously due to the consistent warm color scheme and overall cheerful tone, suggesting a casual yet professional event.\n\nTags:\nCulinary event, chef profiles, cooking competition, promotional graphic, food illustration, portrait photography, warm color palette, yellow background, orange text, sans-serif font, friendly design, inviting layout, social media content, event announcement, chefs, diverse food, digital art, graphic design.\n\nUser Intent:\nCreate a vibrant and engaging promotional graphic to introduce the participants or judges of a culinary competition or event, showcasing their portraits and names alongside decorative food-related illustrations and a clear event title."
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/lica-data/images/000a2533-b083-4d03-9602-715b8bf4409d/1KoSdCqVsdnauEhMLoDV.png b/src/gdb/_verify_data/lica-data/images/000a2533-b083-4d03-9602-715b8bf4409d/1KoSdCqVsdnauEhMLoDV.png
new file mode 100644
index 0000000..72188ea
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/000a2533-b083-4d03-9602-715b8bf4409d/1KoSdCqVsdnauEhMLoDV.png differ
diff --git a/src/gdb/_verify_data/lica-data/images/000f0ff6-c99c-4376-9a84-a57b99d4a6ba/DUshGixLZ65JSC09f66p.png b/src/gdb/_verify_data/lica-data/images/000f0ff6-c99c-4376-9a84-a57b99d4a6ba/DUshGixLZ65JSC09f66p.png
new file mode 100644
index 0000000..1b8389a
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/000f0ff6-c99c-4376-9a84-a57b99d4a6ba/DUshGixLZ65JSC09f66p.png differ
diff --git a/src/gdb/_verify_data/lica-data/images/00863b23-aa9e-4572-b124-01749574e893/gT6aha6BT385mIz8ddgx.png b/src/gdb/_verify_data/lica-data/images/00863b23-aa9e-4572-b124-01749574e893/gT6aha6BT385mIz8ddgx.png
new file mode 100644
index 0000000..ef8f868
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/00863b23-aa9e-4572-b124-01749574e893/gT6aha6BT385mIz8ddgx.png differ
diff --git a/src/gdb/_verify_data/lica-data/images/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.png b/src/gdb/_verify_data/lica-data/images/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.png
new file mode 100644
index 0000000..efe9051
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.png differ
diff --git a/src/gdb/_verify_data/lica-data/images/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.png b/src/gdb/_verify_data/lica-data/images/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.png
new file mode 100644
index 0000000..0c7af90
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.png differ
diff --git a/src/gdb/_verify_data/lica-data/images/7b5d33fc-979d-4b48-bb70-d438e75c2dd0/ggo0WRiaZyivLfqsl7j6.png b/src/gdb/_verify_data/lica-data/images/7b5d33fc-979d-4b48-bb70-d438e75c2dd0/ggo0WRiaZyivLfqsl7j6.png
new file mode 100644
index 0000000..857cad2
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/7b5d33fc-979d-4b48-bb70-d438e75c2dd0/ggo0WRiaZyivLfqsl7j6.png differ
diff --git a/src/gdb/_verify_data/lica-data/images/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.png b/src/gdb/_verify_data/lica-data/images/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.png
new file mode 100644
index 0000000..0147da1
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.png differ
diff --git a/src/gdb/_verify_data/lica-data/images/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.png b/src/gdb/_verify_data/lica-data/images/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.png
new file mode 100644
index 0000000..b22767c
Binary files /dev/null and b/src/gdb/_verify_data/lica-data/images/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.png differ
diff --git a/src/gdb/_verify_data/lica-data/layouts/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.json b/src/gdb/_verify_data/lica-data/layouts/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.json
new file mode 100644
index 0000000..68e0fcf
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/layouts/01051363-955b-464e-a725-c433e4e7ee97/qgwGx1HCTDqWSoR1Iksr.json
@@ -0,0 +1,30 @@
+{
+ "components": [
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/c0580e66-acd7-42e2-9e05-c10501489e82.png",
+ "left": "126.646px",
+ "top": "396.242px",
+ "width": "826.709px",
+ "height": "553.895px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/485ecf91-cf94-442c-9e48-78247f9d9d0b.png",
+ "left": "231.371px",
+ "top": "129.863px",
+ "width": "617.257px",
+ "height": "227.614px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ }
+ ],
+ "background": "rgb(252, 230, 209)",
+ "width": "1080px",
+ "height": "1080px",
+ "duration": 3
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/lica-data/layouts/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.json b/src/gdb/_verify_data/lica-data/layouts/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.json
new file mode 100644
index 0000000..f4dcd34
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/layouts/08060e52-d186-487d-b9e9-0be9d6599a0b/B0zh8N0hAZYAFe3l2vmH.json
@@ -0,0 +1,127 @@
+{
+ "components": [
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/cbc25992-50c1-4094-8a8e-933bac00f984.jpg",
+ "width": "1636.36px",
+ "height": "1080.0px",
+ "transform": "rotate(0deg)",
+ "opacity": 0.4,
+ "overflow": "visible"
+ },
+ {
+ "type": "GROUP",
+ "left": "121.894px",
+ "top": "216.833px",
+ "width": "836.2107359700001px",
+ "height": "432.37632px",
+ "transform": "none",
+ "background": "rgb(252, 252, 252)",
+ "backgroundColor": "rgb(252, 252, 252)",
+ "clipPath": "path(\"M478.1288073467515,256L16.97169520498455,256C7.61994478591143,256 0,248.38005521408854 0,239.02830479501543L0,16.97169520498455C0,7.61994478591143 7.61994478591143,0 16.97169520498455,0L478.1288073467515,0C487.4805577658247,0 495.10050255173616,7.61994478591143 495.10050255173616,16.97169520498455L495.10050255173616,239.02830479501546C495.10050255173616,248.3800552140886 487.4805577658247,256 478.1288073467515,256Z\")"
+ },
+ {
+ "type": "GROUP",
+ "left": "427.551px",
+ "top": "104.384px",
+ "width": "224.89856px",
+ "height": "224.89856px",
+ "transform": "none",
+ "background": "rgb(252, 252, 252)",
+ "backgroundColor": "rgb(252, 252, 252)",
+ "clipPath": "path(\"M128,0C57.30755202165845,0 0,57.307552021658445 0,128C0,198.69244797834156 57.30755202165843,256 128,256C198.69244797834153,256 256,198.69244797834156 256,128C256,57.30755202165845 198.69244797834156,0 128,0Z\")"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/9984b8fa-aed9-4a45-9b7d-af969a404e8e.jpg",
+ "left": "424.3354px",
+ "top": "88.4631px",
+ "width": "337.903864484px",
+ "height": "225.26937900900003px",
+ "transform": "rotate(0deg)",
+ "background": "rgba(1, 1, 1, 0)",
+ "clipPath": "path(\"M204.03031612126446,102.01597418189672C204.03031612126446,158.3548653724615 158.35568149372597,204.0295 102.01515806063223,204.0295C45.673818506274024,204.0295 0.0,158.35486537246152 0.0,102.01597418189674C0.0,45.67422656690627 45.673818506274024,0.0 102.01515806063223,0.0C158.35649761499047,0.0 204.03031612126446,45.67422656690627 204.03031612126446,102.01597418189672Z\")"
+ },
+ {
+ "type": "GROUP",
+ "left": "164.215px",
+ "top": "509.631px",
+ "width": "751.56921324px",
+ "height": "97.841664px",
+ "transform": "none",
+ "background": "rgb(160, 159, 124)",
+ "backgroundColor": "rgb(160, 159, 124)",
+ "clipPath": "path(\"M1918.215440953273,256L48.24615384615386,256C21.66153846153847,256 0,234.3384615384616 0,207.7538461538462L0,48.24615384615386C0,21.66153846153847 21.66153846153847,0 48.24615384615386,0L1918.215440953273,0C1944.8000563378885,0 1966.461594799427,21.66153846153847 1966.461594799427,48.24615384615386L1966.461594799427,207.7538461538462C1966.461594799427,234.3384615384616 1944.8000563378885,256 1918.215440953273,256Z\")"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/55f08d1c-25dc-41ce-ada1-bdf88acfb638.png",
+ "id": "0-0",
+ "left": "243.404px",
+ "top": "749.132px",
+ "width": "593.191px",
+ "height": "663.736px",
+ "transform": "rotate(0deg)",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "TEXT",
+ "text": "ask question below...",
+ "left": "187.66px",
+ "top": "534.4615600000001px",
+ "width": "704.68px",
+ "height": "41.0211px",
+ "transform": "none",
+ "color": "rgb(252, 252, 252)",
+ "fontSize": "34px",
+ "fontFamily": "Rufina--400",
+ "fontWeight": "400",
+ "textAlign": "center",
+ "lineHeight": "48.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "ASK",
+ "left": "164.215px",
+ "top": "353.16916px",
+ "width": "751.571px",
+ "height": "66.2437px",
+ "transform": "none",
+ "color": "rgb(3, 78, 49)",
+ "fontSize": "55px",
+ "fontFamily": "Cinzel--400",
+ "fontWeight": "400",
+ "textAlign": "center",
+ "lineHeight": "77.0px",
+ "letterSpacing": "0em",
+ "textTransform": "uppercase",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "Gardening Questions",
+ "left": "164.215px",
+ "top": "425.70526px",
+ "width": "751.571px",
+ "height": "45.6819px",
+ "transform": "none",
+ "color": "rgb(3, 78, 49)",
+ "fontSize": "38px",
+ "fontFamily": "Cinzel--400",
+ "fontWeight": "400",
+ "textAlign": "center",
+ "lineHeight": "53.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ }
+ ],
+ "background": "rgb(167, 184, 155)",
+ "width": "1080px",
+ "height": "1080px",
+ "duration": 3
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/lica-data/layouts/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.json b/src/gdb/_verify_data/lica-data/layouts/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.json
new file mode 100644
index 0000000..e3a64cb
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/layouts/b88fde26-58c2-4d55-8b81-66601086a0d0/58Uk9hjmhLtCzWgSrIbB.json
@@ -0,0 +1,247 @@
+{
+ "components": [
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/68c1e9b1-8362-430e-8282-3171861eabe4.png",
+ "left": "79.5792px",
+ "top": "94.0052px",
+ "width": "322.867px",
+ "height": "322.06px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "TEXT",
+ "text": "CONTACT",
+ "left": "229.598px",
+ "top": "381.2015px",
+ "width": "935.488px",
+ "height": "145.692px",
+ "transform": "none",
+ "color": "rgb(252, 105, 45)",
+ "fontSize": "121px",
+ "fontFamily": "League Spartan--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "103.118568px",
+ "letterSpacing": "0em",
+ "textTransform": "uppercase",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/c7acd7b4-ae7a-43c1-be65-30163e507a4b.png",
+ "left": "-130.09px",
+ "top": "775.098px",
+ "width": "260.181px",
+ "height": "259.53px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "TEXT",
+ "text": "Email",
+ "left": "241.013px",
+ "top": "526.9960000000001px",
+ "width": "192.777px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "lorem@ipsum.co",
+ "left": "473.162px",
+ "top": "526.9960000000001px",
+ "width": "548.145px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "Web",
+ "left": "241.013px",
+ "top": "584.8040000000001px",
+ "width": "192.777px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "www.loremipsum.co",
+ "left": "473.162px",
+ "top": "584.8040000000001px",
+ "width": "548.145px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "Tel",
+ "left": "241.013px",
+ "top": "642.613px",
+ "width": "192.777px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "+531-842",
+ "left": "473.162px",
+ "top": "642.613px",
+ "width": "497.851px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "Addr",
+ "left": "241.013px",
+ "top": "700.581px",
+ "width": "192.777px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "V. Lorem 42, U. Dolor, IP 98",
+ "left": "473.162px",
+ "top": "700.581px",
+ "width": "542.183px",
+ "height": "33.4px",
+ "transform": "none",
+ "color": "rgb(106, 42, 1)",
+ "fontSize": "28px",
+ "fontFamily": "Arimo--400",
+ "fontWeight": "400",
+ "textAlign": "left",
+ "lineHeight": "39.0px",
+ "letterSpacing": "0em",
+ "textTransform": "none",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/b24b00a7-9b01-4532-b39e-37963be77831.png",
+ "left": "1179.23px",
+ "top": "-89.2567px",
+ "width": "632.772px",
+ "height": "598.76px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/7566c848-0659-4774-9163-4b28763d18ea.png",
+ "left": "1060.31px",
+ "top": "529.796px",
+ "width": "600.543px",
+ "height": "598.291px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/5af73775-e01d-45e5-ab99-cc08d67b166d.png",
+ "left": "1680.52px",
+ "top": "409.981px",
+ "width": "586.8px",
+ "height": "586.8px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/45f83ec5-33b7-455c-bdcc-d7cc038352b7.png",
+ "left": "634.231px",
+ "top": "876.781px",
+ "width": "302.589px",
+ "height": "301.833px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/ed4f0731-a7f6-4370-b0ac-f86f2805d575.png",
+ "left": "799.067px",
+ "top": "49.5932px",
+ "width": "160.933px",
+ "height": "160.53px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ }
+ ],
+ "background": "rgb(252, 240, 189)",
+ "width": "1920px",
+ "height": "1080px",
+ "duration": 3
+}
\ No newline at end of file
diff --git a/src/gdb/_verify_data/lica-data/layouts/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.json b/src/gdb/_verify_data/lica-data/layouts/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.json
new file mode 100644
index 0000000..966c671
--- /dev/null
+++ b/src/gdb/_verify_data/lica-data/layouts/b88fde26-58c2-4d55-8b81-66601086a0d0/iZCigr3Lv9tAwHXDkxch.json
@@ -0,0 +1,208 @@
+{
+ "components": [
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/8e64019e-fd6a-4878-8ab7-e0e39f4be8e2.jpg",
+ "left": "28.918000000000006px",
+ "top": "540.0px",
+ "width": "735.59306776px",
+ "height": "491.00803980000006px",
+ "transform": "rotate(0deg)",
+ "background": "rgba(1, 1, 1, 0)",
+ "clipPath": "path(\"M135.28576,0.0C60.569497882731255,0.0 0.0,60.56949788273124 0.0,135.28576C0.0,210.00202211726875 60.56949788273123,270.57152 135.28576,270.57152C210.00202211726875,270.57152 270.57152,210.0020221172688 270.57152,135.28576C270.57152,60.569497882731284 210.0020221172688,0.0 135.28576,0.0Z\")"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/d3d88030-1c47-4716-b60b-a5d75073d76e.jpg",
+ "left": "547.4390000000001px",
+ "top": "400.39099999999996px",
+ "width": "532.06832488px",
+ "height": "798.60135356px",
+ "transform": "rotate(0deg)",
+ "background": "rgba(1, 1, 1, 0)",
+ "clipPath": "path(\"M135.28576,0.0C60.569497882731255,0.0 0.0,60.56949788273124 0.0,135.28576C0.0,210.00202211726875 60.56949788273123,270.57152 135.28576,270.57152C210.00202211726875,270.57152 270.57152,210.0020221172688 270.57152,135.28576C270.57152,60.569497882731284 210.0020221172688,0.0 135.28576,0.0Z\")"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/34d8b0a2-7352-4d22-b7ef-01b76daf435a.jpg",
+ "left": "971.0437999999999px",
+ "top": "506.0757px",
+ "width": "303.16904664000003px",
+ "height": "455.03788144000004px",
+ "transform": "rotate(0deg)",
+ "background": "rgba(1, 1, 1, 0)",
+ "clipPath": "path(\"M135.28576,0.0C60.569497882731255,0.0 0.0,60.56949788273124 0.0,135.28576C0.0,210.00202211726875 60.56949788273123,270.57152 135.28576,270.57152C210.00202211726875,270.57152 270.57152,210.0020221172688 270.57152,135.28576C270.57152,60.569497882731284 210.0020221172688,0.0 135.28576,0.0Z\")"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/8f9ad9ad-a540-4b69-a4c7-6f98e74537b8.jpg",
+ "left": "1075.8129999999999px",
+ "top": "350.781px",
+ "width": "760.9506924000001px",
+ "height": "1142.1394596000002px",
+ "transform": "rotate(0deg)",
+ "background": "rgba(1, 1, 1, 0)",
+ "clipPath": "path(\"M135.28576,0.0C60.569497882731255,0.0 0.0,60.56949788273124 0.0,135.28576C0.0,210.00202211726875 60.56949788273123,270.57152 135.28576,270.57152C210.00202211726875,270.57152 270.57152,210.0020221172688 270.57152,135.28576C270.57152,60.569497882731284 210.0020221172688,0.0 135.28576,0.0Z\")"
+ },
+ {
+ "type": "TEXT",
+ "text": "BEST CHEF",
+ "left": "475.159px",
+ "top": "213.95003px",
+ "width": "969.682px",
+ "height": "277.38px",
+ "transform": "none",
+ "color": "rgb(252, 105, 45)",
+ "fontSize": "121px",
+ "fontFamily": "League Spartan--400",
+ "fontWeight": "400",
+ "textAlign": "center",
+ "lineHeight": "132.0px",
+ "letterSpacing": "0em",
+ "textTransform": "uppercase",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "PAT",
+ "left": "700.43837px",
+ "top": "827.97397px",
+ "width": "209.126px",
+ "height": "42.5568px",
+ "transform": "none",
+ "color": "rgb(252, 105, 45)",
+ "fontSize": "35px",
+ "fontFamily": "Arimo--700",
+ "fontWeight": "700",
+ "textAlign": "center",
+ "lineHeight": "49.744524px",
+ "letterSpacing": "0.096em",
+ "textTransform": "uppercase",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "SASHA",
+ "left": "1014.6653699999999px",
+ "top": "827.97397px",
+ "width": "209.126px",
+ "height": "42.5568px",
+ "transform": "none",
+ "color": "rgb(252, 105, 45)",
+ "fontSize": "35px",
+ "fontFamily": "Arimo--700",
+ "fontWeight": "700",
+ "textAlign": "center",
+ "lineHeight": "49.744524px",
+ "letterSpacing": "0.096em",
+ "textTransform": "uppercase",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "DEW",
+ "left": "1328.89537px",
+ "top": "827.97397px",
+ "width": "209.126px",
+ "height": "42.5568px",
+ "transform": "none",
+ "color": "rgb(252, 105, 45)",
+ "fontSize": "35px",
+ "fontFamily": "Arimo--700",
+ "fontWeight": "700",
+ "textAlign": "center",
+ "lineHeight": "49.744524px",
+ "letterSpacing": "0.096em",
+ "textTransform": "uppercase",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "TEXT",
+ "text": "TOM",
+ "left": "386.20937px",
+ "top": "827.97397px",
+ "width": "209.126px",
+ "height": "42.5568px",
+ "transform": "none",
+ "color": "rgb(252, 105, 45)",
+ "fontSize": "35px",
+ "fontFamily": "Arimo--700",
+ "fontWeight": "700",
+ "textAlign": "center",
+ "lineHeight": "49.744524px",
+ "letterSpacing": "0.096em",
+ "textTransform": "uppercase",
+ "fontStyle": "normal"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/c3eef128-52a4-4ab4-8b60-2a159989368d.png",
+ "left": "1460.22px",
+ "top": "-148.434px",
+ "width": "635.457px",
+ "height": "633.074px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/e3e9b3b2-bb86-48d3-865f-2e3873cfad7d.png",
+ "left": "-272.744px",
+ "top": "610.613px",
+ "width": "552.115px",
+ "height": "528.65px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/f7b15b86-f749-47af-bf1c-fa185ff201bd.png",
+ "left": "-160.302px",
+ "top": "-216.745px",
+ "width": "798.004px",
+ "height": "700.249px",
+ "transform": "rotate(-5.1646deg)",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/f808ea73-6260-4e27-8ad5-3852c694364d.png",
+ "left": "1479.75px",
+ "top": "932.385px",
+ "width": "193.691px",
+ "height": "193.207px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/7224d504-8226-4ce1-b730-703b3a1c0e99.png",
+ "left": "1622.67px",
+ "top": "621.286px",
+ "width": "365.299px",
+ "height": "364.386px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ },
+ {
+ "type": "IMAGE",
+ "src": "https://storage.googleapis.com/lica-video/48d21d26-2aee-4d4c-95b8-6bc8df08af02.png",
+ "left": "294.966px",
+ "top": "249.346px",
+ "width": "193.691px",
+ "height": "193.207px",
+ "transform": "none",
+ "opacity": 1,
+ "overflow": "hidden"
+ }
+ ],
+ "background": "rgb(252, 240, 189)",
+ "width": "1920px",
+ "height": "1080px",
+ "duration": 3
+}
\ No newline at end of file
diff --git a/src/gdb/cli.py b/src/gdb/cli.py
new file mode 100644
index 0000000..0f4a9aa
--- /dev/null
+++ b/src/gdb/cli.py
@@ -0,0 +1,1134 @@
+"""GDB command-line interface.
+
+Installed as the ``gdb`` console script and also exposed as ``python -m gdb``.
+
+Subcommands
+-----------
+
+Introspection (no API keys, no data downloads):
+
+* ``gdb list`` — list registered benchmarks and pipeline readiness
+* ``gdb info ID`` — show metadata for a single benchmark
+* ``gdb suites`` — list named suites (``v0-all``, ``v0-smoke``, …)
+
+Evaluation:
+
+* ``gdb verify`` — run the stub model against ``v0-smoke``. Zero API keys;
+ confirms the install is functional.
+* ``gdb eval`` — run a real model (online / streaming inference).
+* ``gdb submit`` — submit to a provider batch API (~50 %% cheaper).
+* ``gdb collect`` — collect results from a previous ``gdb submit``.
+
+Reporting:
+
+* ``gdb score PATH`` — re-score a precomputed CSV of model outputs.
+* ``gdb report PATH`` — pretty-print a run-report JSON to markdown.
+
+Every subcommand that runs inference accepts ``--suite NAME`` (preferred) or
+an explicit ``--benchmarks ID [ID …]`` list. Reported results should cite
+the suite name **and** the ``lica-gdb`` package version.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import logging
+import sys
+import time
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Callable, Dict, List, Optional, Set, Tuple
+
+from .base import BaseBenchmark, TaskType
+from .evaluation.reporting import RunReport
+from .registry import BenchmarkRegistry
+from .runner import BenchmarkRunner
+from .suites import describe_suite, list_suites, resolve_suite
+
+logger = logging.getLogger(__name__)
+
+PROVIDER_TO_REGISTRY: Dict[str, str] = {
+ "gemini": "google",
+ "openai": "openai",
+ "openai_image": "openai_image",
+ "anthropic": "anthropic",
+ "hf": "hf",
+ "vllm": "vllm",
+ "diffusion": "diffusion",
+ "custom": "custom",
+}
+
+DEFAULT_MODEL_IDS: Dict[str, str] = {
+ "gemini": "gemini-2.0-flash",
+ "openai": "gpt-4o",
+ "openai_image": "gpt-image-1.5",
+ "anthropic": "claude-sonnet-4-20250514",
+ "hf": "Qwen/Qwen3-VL-4B-Instruct",
+ "vllm": "Qwen/Qwen3-VL-4B-Instruct",
+ "diffusion": "flux.2-klein-4b",
+ "custom": "custom-entrypoint",
+}
+
+_MODALITY_CHOICES = [
+ "text",
+ "image",
+ "both",
+ "text_and_image",
+ "image_generation",
+ "any",
+]
+
+
+# ----------------------------------------------------------------------------
+# Output roots
+# ----------------------------------------------------------------------------
+
+
+def _default_output_root() -> Path:
+ """Where to write reports / tracker logs when the user doesn't specify.
+
+ When running from the repo checkout, ``./outputs`` already exists and is
+ gitignored; when running from a ``pip install``-ed copy we still resolve
+ to ``./outputs`` under the caller's cwd, which is the conventional
+ "results live next to the command I ran" behaviour.
+ """
+ return Path.cwd() / "outputs"
+
+
+def _default_jobs_root() -> Path:
+ return Path.cwd() / "jobs"
+
+
+# ----------------------------------------------------------------------------
+# Registry helpers
+# ----------------------------------------------------------------------------
+
+
+def _build_registry() -> BenchmarkRegistry:
+ registry = BenchmarkRegistry()
+ registry.discover()
+ return registry
+
+
+def _benchmark_pipeline_ready(bench: BaseBenchmark) -> bool:
+ """True if the task overrides the default stubs in :class:`BaseBenchmark`."""
+ cls = type(bench)
+ if getattr(cls, "pipeline_implemented", True) is False:
+ return False
+ return cls.load_data is not BaseBenchmark.load_data
+
+
+def _resolve_benchmark_ids(
+ args: argparse.Namespace, registry: BenchmarkRegistry
+) -> List[str]:
+ """Resolve ``--suite`` or ``--benchmarks`` to a concrete list of IDs."""
+ suite = getattr(args, "suite", None)
+ explicit = getattr(args, "benchmarks", None)
+ if suite and explicit:
+ raise SystemExit("Specify either --suite or --benchmarks, not both.")
+ if suite:
+ try:
+ return resolve_suite(suite, registry)
+ except KeyError as exc:
+ raise SystemExit(str(exc)) from exc
+ if explicit:
+ return list(explicit)
+ raise SystemExit("One of --suite or --benchmarks is required.")
+
+
+# ----------------------------------------------------------------------------
+# Model construction
+# ----------------------------------------------------------------------------
+
+
+def _make_stub_model() -> Any:
+ from gdb.models.base import BaseModel, Modality, ModelOutput
+
+ class StubModel(BaseModel):
+ name = "stub"
+ modality = Modality.ANY
+
+ def predict(self, inp: Any) -> ModelOutput:
+ return ModelOutput(text="", images=[])
+
+ return StubModel()
+
+
+def _parse_json_dict_arg(value: Any, *, field_name: str) -> Dict[str, Any]:
+ if value is None:
+ return {}
+ if isinstance(value, dict):
+ return value
+ if isinstance(value, str):
+ text = value.strip()
+ if not text:
+ return {}
+ path = Path(text)
+ if path.is_file():
+ text = path.read_text(encoding="utf-8")
+ parsed = json.loads(text)
+ if not isinstance(parsed, dict):
+ raise ValueError(f"{field_name} must be a JSON object/dict.")
+ return parsed
+ raise ValueError(f"{field_name} must be a JSON object/dict or JSON file path.")
+
+
+def _resolve_model_modality(
+ args: argparse.Namespace, *, provider: str
+) -> Optional[str]:
+ if provider == "custom":
+ return (
+ getattr(args, "custom_modality", None)
+ or getattr(args, "model_modality", None)
+ or getattr(args, "modality", None)
+ or "any"
+ )
+ return (
+ getattr(args, "model_modality", None)
+ or getattr(args, "modality", None)
+ or None
+ )
+
+
+def _build_model_from_parts(
+ provider: str, model_id: str, args: argparse.Namespace
+) -> Any:
+ from gdb.models import load_model
+
+ if provider == "custom":
+ entrypoint = (
+ getattr(args, "custom_entry", None)
+ or getattr(args, "entrypoint", None)
+ or (model_id if model_id != DEFAULT_MODEL_IDS["custom"] else "")
+ )
+ if not entrypoint:
+ raise ValueError(
+ "custom provider requires an entrypoint. "
+ "Set --custom-entry module.path:attr, "
+ "or use custom:module.path:attr in --multi-models."
+ )
+ init_kwargs = _parse_json_dict_arg(
+ getattr(args, "custom_init_kwargs", None)
+ or getattr(args, "init_kwargs", None),
+ field_name="custom init kwargs",
+ )
+ custom_modality = _resolve_model_modality(args, provider=provider)
+ return load_model(
+ "custom",
+ entrypoint=entrypoint,
+ init_kwargs=init_kwargs,
+ modality=custom_modality,
+ )
+
+ if provider == "diffusion":
+ return load_model(
+ "diffusion",
+ model_id=model_id,
+ resolution=getattr(args, "resolution", 1024),
+ )
+
+ kwargs: Dict[str, Any] = {
+ "model_id": model_id,
+ "temperature": getattr(args, "temperature", 0.0),
+ }
+ if getattr(args, "credentials", None):
+ kwargs["credentials_path"] = args.credentials
+ if getattr(args, "max_tokens", None) is not None:
+ kwargs["max_tokens"] = args.max_tokens
+ if provider == "hf":
+ kwargs["device"] = getattr(args, "device", "auto")
+ if getattr(args, "max_tokens", None) is not None:
+ kwargs["max_new_tokens"] = args.max_tokens
+ kwargs.pop("max_tokens", None)
+ model_modality = _resolve_model_modality(args, provider=provider)
+ if model_modality is not None:
+ kwargs["modality"] = model_modality
+ if provider == "vllm":
+ kwargs["tensor_parallel_size"] = getattr(args, "tensor_parallel_size", 1)
+ kwargs["top_p"] = getattr(args, "top_p", 1.0)
+ kwargs["top_k"] = getattr(args, "top_k", -1)
+ kwargs["repetition_penalty"] = getattr(args, "repetition_penalty", 1.0)
+ if getattr(args, "presence_penalty", None) is not None:
+ kwargs["presence_penalty"] = args.presence_penalty
+ if getattr(args, "limit_mm_per_prompt", None) is not None:
+ kwargs["limit_mm_per_prompt"] = {"image": args.limit_mm_per_prompt}
+ if getattr(args, "max_num_batched_tokens", None) is not None:
+ kwargs["max_num_batched_tokens"] = args.max_num_batched_tokens
+ if getattr(args, "no_thinking", False):
+ kwargs["enable_thinking"] = False
+ model_modality = _resolve_model_modality(args, provider=provider)
+ if model_modality is not None:
+ kwargs["modality"] = model_modality
+
+ return load_model(PROVIDER_TO_REGISTRY[provider], **kwargs)
+
+
+def _build_single_model(args: argparse.Namespace) -> Tuple[str, Any]:
+ provider = args.provider
+ model_id = args.model_id or DEFAULT_MODEL_IDS[provider]
+ if provider == "custom" and getattr(args, "custom_entry", None):
+ name = f"custom:{args.custom_entry}"
+ else:
+ name = model_id
+ return name, _build_model_from_parts(provider, model_id, args)
+
+
+def _parse_model_spec(spec: str) -> Tuple[str, str, str]:
+ alias = ""
+ body = spec.strip()
+ if "=" in body:
+ alias, body = body.split("=", 1)
+ alias = alias.strip()
+ if ":" not in body:
+ raise ValueError(
+ f"Invalid --multi-models spec {spec!r}. "
+ "Use provider:model_id or alias=provider:model_id."
+ )
+ provider, model_id = body.split(":", 1)
+ provider = provider.strip()
+ model_id = model_id.strip()
+ if provider not in PROVIDER_TO_REGISTRY:
+ raise ValueError(
+ f"Unknown provider {provider!r} in spec {spec!r}. "
+ f"Choose from: {', '.join(sorted(PROVIDER_TO_REGISTRY))}"
+ )
+ name = alias or f"{provider}:{model_id}"
+ return name, provider, model_id
+
+
+def _build_models(args: argparse.Namespace) -> Dict[str, Any]:
+ """Build the ``{name -> model}`` dict requested by the user."""
+ if getattr(args, "stub_model", False):
+ return {"stub": _make_stub_model()}
+ if getattr(args, "multi_models", None):
+ models: Dict[str, Any] = {}
+ for spec in args.multi_models:
+ name, provider, model_id = _parse_model_spec(spec)
+ models[name] = _build_model_from_parts(provider, model_id, args)
+ return models
+ if getattr(args, "provider", None):
+ name, model = _build_single_model(args)
+ return {name: model}
+ raise SystemExit("--provider, --multi-models, or --stub-model required")
+
+
+# ----------------------------------------------------------------------------
+# Preflight
+# ----------------------------------------------------------------------------
+
+
+def _collect_preflight_warnings(
+ registry: BenchmarkRegistry,
+ benchmark_ids: List[str],
+ models: Dict[str, Any],
+) -> List[str]:
+ from gdb.models.base import Modality
+
+ def _supports_image_output(model: Any, modality: Any) -> bool:
+ return bool(
+ getattr(
+ model,
+ "supports_image_output",
+ modality in {Modality.IMAGE_GENERATION, Modality.ANY},
+ )
+ )
+
+ def _supports_video_output(model: Any) -> bool:
+ return bool(getattr(model, "supports_video_output", False))
+
+ def _supports_image_input(model: Any, modality: Any) -> bool:
+ return bool(
+ getattr(
+ model,
+ "supports_image_input",
+ modality in {Modality.TEXT_AND_IMAGE, Modality.ANY},
+ )
+ )
+
+ def _supports_mask_editing(model: Any) -> bool:
+ return bool(getattr(model, "supports_mask_editing", False))
+
+ image_tokens = (
+ "input image",
+ "layout image",
+ "source image",
+ "source composite image",
+ "rendered image",
+ "reference image",
+ "component asset",
+ "component assets",
+ "visual component",
+ "visual components",
+ "mask",
+ "masked",
+ )
+ visual_tokens = image_tokens + ("video",)
+
+ warnings: List[str] = []
+ seen: Set[str] = set()
+ for bid in benchmark_ids:
+ bench = registry.get(bid)
+ inp = str(bench.meta.input_spec or "").lower()
+ out = str(bench.meta.output_spec or "").lower()
+ needs_visual_input = any(t in inp for t in visual_tokens)
+ needs_image_output = any(t in out for t in ("image", "png", "jpg", "jpeg"))
+ needs_video_output = any(t in out for t in ("video", "mp4"))
+ needs_image_conditioning = needs_image_output and any(
+ t in inp for t in image_tokens
+ )
+ needs_mask_editing = needs_image_output and any(
+ t in inp for t in ("mask", "masked", "editable")
+ )
+
+ for name, model in models.items():
+ modality = getattr(model, "modality", None)
+
+ def _add(msg: str) -> None:
+ if msg not in seen:
+ seen.add(msg)
+ warnings.append(msg)
+
+ if needs_visual_input and modality == Modality.TEXT:
+ _add(
+ f"{bid} expects visual input ({bench.meta.input_spec}); "
+ f"model '{name}' is text-only."
+ )
+ if needs_image_output and not _supports_image_output(model, modality):
+ _add(
+ f"{bid} expects image output ({bench.meta.output_spec}); "
+ f"model '{name}' may need an image-generation capable wrapper."
+ )
+ if needs_video_output and not _supports_video_output(model):
+ _add(
+ f"{bid} expects video output ({bench.meta.output_spec}); "
+ f"model '{name}' does not advertise video-generation support."
+ )
+ if needs_image_conditioning and not _supports_image_input(model, modality):
+ _add(
+ f"{bid} uses source/reference images ({bench.meta.input_spec}); "
+ f"model '{name}' may ignore those visual inputs."
+ )
+ if needs_mask_editing and not _supports_mask_editing(model):
+ _add(
+ f"{bid} is a masked image-editing task ({bench.meta.input_spec}); "
+ f"model '{name}' does not advertise mask/inpainting support."
+ )
+ return warnings
+
+
+# ----------------------------------------------------------------------------
+# Shared argument groups
+# ----------------------------------------------------------------------------
+
+
+def _add_selection_arguments(p: argparse.ArgumentParser) -> None:
+ group = p.add_mutually_exclusive_group()
+ group.add_argument(
+ "--suite",
+ metavar="NAME",
+ help=f"Named suite. Choices: {', '.join(list_suites())}",
+ )
+ group.add_argument(
+ "--benchmarks",
+ nargs="+",
+ metavar="ID",
+ help="Explicit benchmark IDs (e.g. layout-4 svg-1).",
+ )
+ p.add_argument(
+ "--data",
+ default=None,
+ help="Override data directory for the task(s). Rarely needed when using --dataset-root.",
+ )
+ p.add_argument(
+ "--dataset-root",
+ default=None,
+ help="Local Lica bundle root (lica-data/ + benchmarks/). "
+ "When omitted, data is loaded from the HuggingFace Hub.",
+ )
+ p.add_argument(
+ "--n",
+ type=int,
+ default=None,
+ help="Limit number of samples per task (default: all).",
+ )
+
+
+def _add_model_arguments(p: argparse.ArgumentParser) -> None:
+ p.add_argument(
+ "--stub-model",
+ action="store_true",
+ help="Use the built-in stub model (no API keys). Primarily for smoke tests.",
+ )
+ p.add_argument("--provider", choices=list(PROVIDER_TO_REGISTRY.keys()))
+ p.add_argument("--model-id", default=None)
+ p.add_argument(
+ "--multi-models",
+ nargs="+",
+ metavar="SPEC",
+ default=None,
+ help="Run multiple models in one pass. Format: provider:model_id or alias=provider:model_id",
+ )
+ p.add_argument("--credentials", default=None)
+ p.add_argument(
+ "--custom-entry",
+ default=None,
+ help="Importable Python entrypoint for --provider custom: module.path:attr",
+ )
+ p.add_argument("--custom-init-kwargs", default=None)
+ p.add_argument(
+ "--custom-modality",
+ choices=_MODALITY_CHOICES,
+ default="any",
+ )
+ p.add_argument(
+ "--model-modality",
+ choices=_MODALITY_CHOICES,
+ default=None,
+ help="Override modality declaration for local providers (hf/vllm).",
+ )
+ p.add_argument("--temperature", type=float, default=0.0)
+ p.add_argument("--max-tokens", type=int, default=None)
+ p.add_argument("--top-p", type=float, default=1.0)
+ p.add_argument("--top-k", type=int, default=-1)
+ p.add_argument("--repetition-penalty", type=float, default=1.0)
+ p.add_argument("--presence-penalty", type=float, default=0.0)
+ p.add_argument("--device", default="auto", help="HF device (auto/cpu/cuda/mps)")
+ p.add_argument("--tensor-parallel-size", type=int, default=1)
+ p.add_argument("--resolution", type=int, default=1024)
+ p.add_argument("--limit-mm-per-prompt", type=int, default=None)
+ p.add_argument("--max-num-batched-tokens", type=int, default=None)
+ p.add_argument("--no-thinking", action="store_true")
+ p.add_argument("--batch-size", type=int, default=None)
+ p.add_argument(
+ "--input-modality",
+ choices=["text", "image", "both"],
+ default=None,
+ help="Override template-task input modality (text/image/both).",
+ )
+
+
+# ----------------------------------------------------------------------------
+# Command handlers
+# ----------------------------------------------------------------------------
+
+
+def cmd_list(args: argparse.Namespace) -> int:
+ registry = _build_registry()
+ task_type = TaskType(args.task_type) if args.task_type else None
+ benches = registry.list(domain=args.domain, task_type=task_type)
+ if not benches:
+ print("No benchmarks matched the given filters.")
+ return 0
+
+ runnable = {b.meta.id for b in registry.list() if _benchmark_pipeline_ready(b)}
+ print(f"{'ID':<18} {'Type':<14} {'Domain':<14} {'Pipeline':<9} Name")
+ print("-" * 90)
+ for b in sorted(benches, key=lambda x: x.meta.id):
+ ready = "ready" if b.meta.id in runnable else "-"
+ print(
+ f"{b.meta.id:<18} {b.meta.task_type.value:<14} "
+ f"{b.meta.domain:<14} {ready:<9} {b.meta.name}"
+ )
+ total_runnable = sum(1 for b in benches if b.meta.id in runnable)
+ print(
+ f"\n{len(benches)} benchmark(s); "
+ f"{total_runnable} have a runnable inference pipeline."
+ )
+ return 0
+
+
+def cmd_info(args: argparse.Namespace) -> int:
+ registry = _build_registry()
+ try:
+ b = registry.get(args.benchmark_id)
+ except KeyError as exc:
+ print(str(exc), file=sys.stderr)
+ return 1
+ m = b.meta
+ print(f"ID: {m.id}")
+ print(f"Name: {m.name}")
+ print(f"Task type: {m.task_type.value}")
+ print(f"Domain: {m.domain}")
+ print(f"Description: {m.description}")
+ if m.input_spec:
+ print(f"Input: {m.input_spec}")
+ if getattr(m, "output_spec", None):
+ print(f"Output: {m.output_spec}")
+ if m.metrics:
+ print(f"Metrics: {', '.join(m.metrics)}")
+ if m.tags:
+ print(f"Tags: {', '.join(m.tags)}")
+ print(f"Pipeline: {'ready' if _benchmark_pipeline_ready(b) else 'not implemented'}")
+ return 0
+
+
+def cmd_suites(args: argparse.Namespace) -> int:
+ registry = _build_registry()
+ if args.name:
+ info = describe_suite(args.name, registry)
+ print(f"Suite: {info['name']} ({info['kind']}, {info['n_tasks']} tasks)")
+ for tid in info["task_ids"]:
+ print(f" - {tid}")
+ return 0
+
+ print(f"{'Suite':<24} {'Kind':<8} {'Tasks':>5}")
+ print("-" * 40)
+ for name in list_suites():
+ info = describe_suite(name, registry)
+ print(f"{name:<24} {info['kind']:<8} {info['n_tasks']:>5}")
+ return 0
+
+
+def _run_online(
+ registry: BenchmarkRegistry,
+ benchmark_ids: List[str],
+ models: Dict[str, Any],
+ args: argparse.Namespace,
+) -> bool:
+ input_modality = None
+ if args.input_modality:
+ from gdb.models.base import Modality
+
+ input_modality = {
+ "text": Modality.TEXT,
+ "image": Modality.IMAGE,
+ "both": Modality.TEXT_AND_IMAGE,
+ }[args.input_modality]
+
+ out_root = Path(args.output_dir) if args.output_dir else _default_output_root()
+ save_dir: Optional[Path] = None
+ if args.save_images:
+ save_dir = (
+ Path(args.images_dir) if args.images_dir else out_root / "generated-images"
+ )
+ save_dir.mkdir(parents=True, exist_ok=True)
+
+ runner = BenchmarkRunner(registry)
+ combined = RunReport()
+ all_ok = True
+
+ for bid in benchmark_ids:
+ bench = registry.get(bid)
+ print(f"\n[{bid}] {bench.meta.name}")
+ try:
+ if args.data:
+ data_display = args.data
+ elif args.dataset_root:
+ data_display = str(bench.resolve_data_dir(args.dataset_root))
+ else:
+ data_display = "HuggingFace Hub"
+ except FileNotFoundError as exc:
+ print(f" FAILED: {exc}")
+ all_ok = False
+ continue
+ print(f" data: {data_display}")
+
+ t0 = time.time()
+ try:
+ report = runner.run(
+ benchmark_ids=[bid],
+ models=models,
+ data_dir=args.data,
+ dataset_root=args.dataset_root,
+ n=args.n,
+ batch_size=args.batch_size,
+ prediction_save_dir=save_dir,
+ input_modality=input_modality,
+ )
+ except Exception as exc: # noqa: BLE001 — user-visible runtime failure
+ print(f" FAILED: {exc}")
+ all_ok = False
+ continue
+
+ for name, result in sorted(report.results[bid].items()):
+ scores = ", ".join(
+ f"{k}={v:.4f}" for k, v in sorted(result.scores.items())
+ )
+ print(
+ f" {name}: {scores} "
+ f"(n={result.count}, ok={result.success_count}, "
+ f"fail={result.failure_count}, "
+ f"fail_rate={result.failure_rate:.1%}, "
+ f"{time.time() - t0:.1f}s)"
+ )
+ combined.results[bid] = report.results[bid]
+
+ if combined.results:
+ if args.output:
+ combined.save(args.output)
+ print(f"\nSaved report to {args.output}")
+ else:
+ out_root.mkdir(parents=True, exist_ok=True)
+ for bid in combined.results:
+ RunReport(results={bid: combined.results[bid]}).save(
+ str(out_root / f"{bid}.csv")
+ )
+ print(f"\nSaved per-task CSVs to {out_root}/")
+
+ if not args.no_log and len(runner.tracker) > 0:
+ out_root.mkdir(parents=True, exist_ok=True)
+ log_path = out_root / "tracker.jsonl"
+ runner.tracker.save(str(log_path))
+ print(f"Tracker log: {log_path}")
+
+ if save_dir is not None:
+ print(f"Generated images: {save_dir}")
+
+ return all_ok
+
+
+def cmd_eval(args: argparse.Namespace) -> int:
+ registry = _build_registry()
+ benchmark_ids = _resolve_benchmark_ids(args, registry)
+ try:
+ models = _build_models(args)
+ except ValueError as exc:
+ print(str(exc), file=sys.stderr)
+ return 2
+
+ if not args.dataset_root and not args.data:
+ print("[info] No --dataset-root provided; loading data from HuggingFace Hub.")
+
+ warnings = _collect_preflight_warnings(registry, benchmark_ids, models)
+ if warnings:
+ print("\n[preflight] Potential model/task compatibility issues:")
+ for msg in warnings:
+ print(f" - {msg}")
+ print(" Continue with caution; some tasks may require a different model.\n")
+
+ ok = _run_online(registry, benchmark_ids, models, args)
+ return 0 if ok else 1
+
+
+def _bundled_verify_root() -> Optional[Path]:
+ """Return the path to the bundled ``_verify_data`` fixture, if shipped."""
+ here = Path(__file__).resolve().parent
+ candidate = here / "_verify_data"
+ if candidate.is_dir() and (candidate / "benchmarks").is_dir():
+ return candidate
+ return None
+
+
+def cmd_verify(args: argparse.Namespace) -> int:
+ """Smoke test: run the stub model against the requested suite (default smoke)."""
+ registry = _build_registry()
+ suite_name = args.suite or "v0-smoke"
+ try:
+ benchmark_ids = resolve_suite(suite_name, registry)
+ except KeyError as exc:
+ print(str(exc), file=sys.stderr)
+ return 2
+ if args.benchmarks:
+ benchmark_ids = list(args.benchmarks)
+
+ # Default to the bundled fixture so `gdb verify` needs no downloads.
+ dataset_root = args.dataset_root
+ using_bundled = False
+ if not dataset_root and not args.data:
+ bundled = _bundled_verify_root()
+ if bundled is not None:
+ dataset_root = str(bundled)
+ using_bundled = True
+
+ source_desc = (
+ "bundled fixture"
+ if using_bundled
+ else (f"dataset-root={dataset_root}" if dataset_root else "HuggingFace Hub")
+ )
+ print(
+ f"Verifying install with stub model on {len(benchmark_ids)} task(s) "
+ f"(suite={suite_name}, n={args.n or 'all'}, data={source_desc}).\n"
+ "Scores will be ~0 — this only checks that inference & scoring run end-to-end."
+ )
+
+ verify_args = argparse.Namespace(
+ **{k: v for k, v in vars(args).items() if k != "dataset_root"},
+ dataset_root=dataset_root,
+ provider=None,
+ model_id=None,
+ multi_models=None,
+ stub_model=True,
+ custom_entry=None,
+ custom_init_kwargs=None,
+ custom_modality="any",
+ model_modality=None,
+ credentials=None,
+ temperature=0.0,
+ max_tokens=None,
+ top_p=1.0,
+ top_k=-1,
+ repetition_penalty=1.0,
+ presence_penalty=0.0,
+ device="auto",
+ tensor_parallel_size=1,
+ resolution=1024,
+ limit_mm_per_prompt=None,
+ max_num_batched_tokens=None,
+ no_thinking=False,
+ batch_size=None,
+ input_modality=None,
+ save_images=False,
+ images_dir=None,
+ )
+ models = {"stub": _make_stub_model()}
+ ok = _run_online(registry, benchmark_ids, models, verify_args)
+ if ok:
+ print("\n[verify] OK — install is functional.")
+ else:
+ print("\n[verify] FAILED — see errors above.")
+ return 0 if ok else 1
+
+
+def cmd_submit(args: argparse.Namespace) -> int:
+ from gdb.inference import BATCH_PROVIDERS, make_batch_runner, save_job_manifest
+
+ if not args.provider:
+ print("--provider is required for `gdb submit`.", file=sys.stderr)
+ return 2
+ if args.provider not in BATCH_PROVIDERS:
+ print(
+ f"`gdb submit` requires one of: {', '.join(sorted(BATCH_PROVIDERS))}",
+ file=sys.stderr,
+ )
+ return 2
+
+ registry = _build_registry()
+ benchmark_ids = _resolve_benchmark_ids(args, registry)
+ if len(benchmark_ids) != 1:
+ print(
+ "`gdb submit` currently supports exactly one benchmark per job. "
+ "Loop over tasks in a shell for now.",
+ file=sys.stderr,
+ )
+ return 2
+ bid = benchmark_ids[0]
+ bench = registry.get(bid)
+
+ model_id = args.model_id or DEFAULT_MODEL_IDS[args.provider]
+ runner = BenchmarkRunner(registry)
+
+ batch_kwargs: Dict[str, Any] = {
+ "model_id": model_id,
+ "temperature": args.temperature,
+ "poll_interval": args.poll_interval,
+ "on_status": lambda msg: print(f" {msg}"),
+ }
+ if args.credentials:
+ batch_kwargs["credentials_path"] = args.credentials
+ if args.bucket:
+ batch_kwargs["bucket"] = args.bucket
+ batch_runner = make_batch_runner(args.provider, **batch_kwargs)
+
+ if args.data:
+ data_display = args.data
+ elif args.dataset_root:
+ data_display = str(bench.resolve_data_dir(args.dataset_root))
+ else:
+ data_display = "HuggingFace Hub"
+ print(f"\n[{bid}] {bench.meta.name}")
+ print(f" data: {data_display}")
+ print(f" provider: {args.provider} / {model_id}")
+
+ manifest_data = runner.submit(
+ bid,
+ batch_runner,
+ data_dir=args.data,
+ dataset_root=args.dataset_root,
+ n=args.n,
+ )
+ extra = {"benchmark_id": bid}
+ if args.provider == "gemini" and hasattr(batch_runner, "_last_submit_meta"):
+ extra["job_prefix"] = batch_runner._last_submit_meta["job_prefix"]
+
+ jobs_dir = Path(args.jobs_dir) if args.jobs_dir else _default_jobs_root()
+ ts = datetime.now().strftime("%Y%m%d_%H%M%S")
+ manifest_path = save_job_manifest(
+ jobs_dir / f"job_{ts}_{args.provider}.json",
+ provider=args.provider,
+ batch_id=manifest_data["batch_id"],
+ model_id=model_id,
+ custom_ids=manifest_data["custom_ids"],
+ ground_truths=manifest_data["ground_truths"],
+ extra=extra,
+ )
+ print(f"\n Job submitted: {manifest_data['batch_id']}")
+ print(f" Manifest: {manifest_path}")
+ print(f"\n To collect: gdb collect {manifest_path}")
+ return 0
+
+
+def cmd_collect(args: argparse.Namespace) -> int:
+ from gdb.inference import load_job_manifest, make_batch_runner
+
+ manifest = load_job_manifest(args.manifest)
+ provider = manifest["provider"]
+ model_id = manifest["model_id"]
+ benchmark_id = manifest.get("benchmark_id") or manifest.get("extra", {}).get(
+ "benchmark_id"
+ )
+
+ print(f"Collecting {provider} batch: {manifest['batch_id']}")
+ print(f" model: {model_id}, samples: {len(manifest['custom_ids'])}")
+
+ batch_kwargs: Dict[str, Any] = {
+ "model_id": model_id,
+ "poll_interval": args.poll_interval,
+ "on_status": lambda msg: print(f" {msg}"),
+ }
+ if args.credentials:
+ batch_kwargs["credentials_path"] = args.credentials
+ if args.bucket:
+ batch_kwargs["bucket"] = args.bucket
+ batch_runner = make_batch_runner(provider, **batch_kwargs)
+
+ collect_kwargs: Dict[str, Any] = {}
+ job_prefix = manifest.get("job_prefix")
+ if provider == "gemini" and job_prefix:
+ collect_kwargs["job_prefix"] = job_prefix
+
+ if benchmark_id:
+ registry = _build_registry()
+ runner = BenchmarkRunner(registry)
+ report = runner.collect(
+ benchmark_id,
+ batch_runner,
+ batch_id=manifest["batch_id"],
+ custom_ids=manifest["custom_ids"],
+ ground_truths=manifest["ground_truths"],
+ model_id=model_id,
+ **collect_kwargs,
+ )
+ result = report.results[benchmark_id][model_id]
+ scores = ", ".join(f"{k}={v:.4f}" for k, v in sorted(result.scores.items()))
+ print(
+ f"\n [{benchmark_id}] {scores} "
+ f"(n={result.count}, ok={result.success_count}, "
+ f"fail={result.failure_count}, "
+ f"fail_rate={result.failure_rate:.1%})"
+ )
+ if args.output:
+ report.save(args.output)
+ print(f" Saved to {args.output}")
+ else:
+ results = batch_runner.collect(
+ batch_id=manifest["batch_id"],
+ custom_ids=manifest["custom_ids"],
+ **collect_kwargs,
+ )
+ ok = sum(1 for r in results.values() if r.success)
+ print(f"\n {ok}/{len(results)} succeeded")
+ return 0
+
+
+def cmd_score(args: argparse.Namespace) -> int:
+ """Score a precomputed CSV of model outputs (no model inference)."""
+ registry = _build_registry()
+ runner = BenchmarkRunner(registry)
+ report = runner.run_from_csv(args.csv_path)
+ print(report.summary())
+ if args.output:
+ report.save(args.output)
+ print(f"\nResults saved to {args.output}")
+ return 0
+
+
+def _render_markdown_report(report_dict: Dict[str, Any]) -> str:
+ """Minimal markdown renderer for a ``RunReport`` JSON dump."""
+ lines: List[str] = ["# GDB run report", ""]
+ meta = report_dict.get("metadata", {})
+ if meta:
+ lines.append("## Metadata")
+ lines.append("")
+ for k, v in meta.items():
+ lines.append(f"- **{k}**: {v}")
+ lines.append("")
+
+ lines.append("## Results")
+ lines.append("")
+ results = report_dict.get("results", {})
+ if not results:
+ lines.append("_(empty)_")
+ return "\n".join(lines)
+
+ metric_cols: List[str] = []
+ seen_cols: Set[str] = set()
+ for models in results.values():
+ for res in models.values():
+ for m in (res.get("scores") or {}):
+ if m not in seen_cols:
+ seen_cols.add(m)
+ metric_cols.append(m)
+
+ header = ["Benchmark", "Model", "n", "fail_rate", *metric_cols]
+ lines.append("| " + " | ".join(header) + " |")
+ lines.append("|" + "|".join("---" for _ in header) + "|")
+ for bid, models in sorted(results.items()):
+ for model_name, res in sorted(models.items()):
+ row = [
+ bid,
+ model_name,
+ str(res.get("count", "")),
+ f"{res.get('failure_rate', 0):.1%}",
+ ]
+ for col in metric_cols:
+ val = (res.get("scores") or {}).get(col)
+ row.append(f"{val:.4f}" if isinstance(val, (int, float)) else "—")
+ lines.append("| " + " | ".join(row) + " |")
+ return "\n".join(lines)
+
+
+def cmd_report(args: argparse.Namespace) -> int:
+ path = Path(args.report_path)
+ if not path.is_file():
+ print(f"No such file: {path}", file=sys.stderr)
+ return 1
+ with open(path, "r", encoding="utf-8") as f:
+ data = json.load(f)
+ md = _render_markdown_report(data)
+ if args.output:
+ Path(args.output).write_text(md, encoding="utf-8")
+ print(f"Wrote {args.output}")
+ else:
+ print(md)
+ return 0
+
+
+# ----------------------------------------------------------------------------
+# Argument parser
+# ----------------------------------------------------------------------------
+
+
+def build_parser() -> argparse.ArgumentParser:
+ parser = argparse.ArgumentParser(
+ prog="gdb",
+ description="GDB — GraphicDesignBench CLI",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ )
+ parser.add_argument(
+ "--version",
+ action="store_true",
+ help="Print the installed lica-gdb version and exit.",
+ )
+ parser.add_argument(
+ "-v", "--verbose", action="store_true", help="Enable debug logging."
+ )
+ sub = parser.add_subparsers(dest="command", metavar="COMMAND")
+
+ p_list = sub.add_parser("list", help="List registered benchmarks.")
+ p_list.add_argument("--domain", help="Filter by domain (e.g. svg, layout).")
+ p_list.add_argument(
+ "--task-type",
+ choices=["understanding", "generation"],
+ help="Filter by task type.",
+ )
+
+ p_info = sub.add_parser("info", help="Show details for a single benchmark.")
+ p_info.add_argument("benchmark_id", help="Benchmark ID (e.g. svg-1).")
+
+ p_suites = sub.add_parser("suites", help="List named suites (or expand one).")
+ p_suites.add_argument(
+ "name", nargs="?", help="If given, print the task IDs in this suite."
+ )
+
+ p_eval = sub.add_parser(
+ "eval",
+ help="Run online inference against one or more benchmarks.",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ )
+ _add_selection_arguments(p_eval)
+ _add_model_arguments(p_eval)
+ p_eval.add_argument("--output", "-o", default=None, help="Save report (.json or .csv).")
+ p_eval.add_argument(
+ "--output-dir",
+ default=None,
+ help="Directory for per-task CSVs / tracker log (default: ./outputs).",
+ )
+ p_eval.add_argument("--save-images", action="store_true")
+ p_eval.add_argument("--images-dir", default=None)
+ p_eval.add_argument("--no-log", action="store_true")
+
+ p_verify = sub.add_parser(
+ "verify",
+ help="Smoke-test the install with the stub model (no API keys).",
+ )
+ p_verify.add_argument("--suite", default=None, help="Defaults to v0-smoke.")
+ p_verify.add_argument("--benchmarks", nargs="+", metavar="ID")
+ p_verify.add_argument("--dataset-root", default=None)
+ p_verify.add_argument("--data", default=None)
+ p_verify.add_argument("--n", type=int, default=2)
+ p_verify.add_argument("--output", "-o", default=None)
+ p_verify.add_argument("--output-dir", default=None)
+ p_verify.add_argument("--no-log", action="store_true")
+
+ p_submit = sub.add_parser(
+ "submit", help="Submit a batch-API job for a single benchmark."
+ )
+ _add_selection_arguments(p_submit)
+ _add_model_arguments(p_submit)
+ p_submit.add_argument("--bucket", default=None, help="GCS bucket for Gemini batch.")
+ p_submit.add_argument("--poll-interval", type=int, default=30)
+ p_submit.add_argument(
+ "--jobs-dir", default=None, help="Where to write the job manifest (default: ./jobs)."
+ )
+
+ p_collect = sub.add_parser(
+ "collect", help="Collect results from a previous `gdb submit` manifest."
+ )
+ p_collect.add_argument("manifest", help="Path to job manifest JSON.")
+ p_collect.add_argument("--credentials", default=None)
+ p_collect.add_argument("--bucket", default=None)
+ p_collect.add_argument("--poll-interval", type=int, default=30)
+ p_collect.add_argument("--output", "-o", default=None)
+
+ p_score = sub.add_parser(
+ "score", help="Re-score a precomputed CSV of model outputs."
+ )
+ p_score.add_argument("csv_path", help="CSV with columns: task, expected_output, _output.")
+ p_score.add_argument("--output", "-o", default=None)
+
+ p_report = sub.add_parser(
+ "report", help="Render a run-report JSON as markdown."
+ )
+ p_report.add_argument("report_path", help="Path to a run-report JSON.")
+ p_report.add_argument("--output", "-o", default=None, help="Write markdown to this file.")
+
+ return parser
+
+
+_DISPATCH: Dict[str, Callable[[argparse.Namespace], int]] = {
+ "list": cmd_list,
+ "info": cmd_info,
+ "suites": cmd_suites,
+ "eval": cmd_eval,
+ "verify": cmd_verify,
+ "submit": cmd_submit,
+ "collect": cmd_collect,
+ "score": cmd_score,
+ "report": cmd_report,
+}
+
+
+def main(argv: Optional[List[str]] = None) -> None:
+ parser = build_parser()
+ args = parser.parse_args(argv)
+
+ if getattr(args, "version", False):
+ from . import __version__
+
+ print(f"lica-gdb {__version__}")
+ return
+
+ if args.verbose:
+ logging.basicConfig(level=logging.DEBUG, format="%(levelname)s %(name)s: %(message)s")
+
+ if not args.command:
+ parser.print_help()
+ sys.exit(0)
+
+ handler = _DISPATCH[args.command]
+ sys.exit(handler(args))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/gdb/suites.py b/src/gdb/suites.py
new file mode 100644
index 0000000..9f28d60
--- /dev/null
+++ b/src/gdb/suites.py
@@ -0,0 +1,97 @@
+"""Named benchmark suites.
+
+A *suite* is a named list of benchmark IDs. Papers should cite a suite name
+(e.g. ``gdb-v0-all``) alongside the ``lica-gdb`` package version, so numbers
+reported in different papers refer to the same evaluation set.
+
+.. note::
+
+ The ``v0-*`` prefix is deliberate: **GDB's evaluation definitions are not
+ yet frozen**. Tasks, prompts, sample selection, and metric wiring may still
+ change between ``lica-gdb`` releases. A ``v1.0-*`` suite family will be
+ introduced once those definitions are pinned with a documented fingerprint;
+ until then, cite the suite name *and* ``lica-gdb`` package version together.
+
+Two kinds of suites:
+
+* **Dynamic suites** are derived from the registry at call time
+ (``v0-all``, ``v0-understanding``, ``v0-generation``). They stay in sync
+ with whatever the installed ``lica-gdb`` version knows how to run, so
+ ``v0-all`` on package 0.2.0 and ``v0-all`` on 0.2.1 may differ.
+
+* **Static suites** are hardcoded lists (``v0-smoke``). The exact set of
+ tasks is fixed in source and does not grow unexpectedly.
+
+The public entry point is :func:`resolve_suite`, which takes a suite name and
+a discovered :class:`~gdb.registry.BenchmarkRegistry` and returns a concrete
+list of benchmark IDs.
+"""
+
+from __future__ import annotations
+
+from typing import Dict, List
+
+from .base import TaskType
+from .registry import BenchmarkRegistry
+
+_SMOKE_SUITE: List[str] = [
+ "category-1",
+ "layout-4",
+ "layout-5",
+ "typography-1",
+ "svg-1",
+ "template-1",
+]
+
+_STATIC_SUITES: Dict[str, List[str]] = {
+ "v0-smoke": _SMOKE_SUITE,
+}
+
+_DYNAMIC_SUITES = {
+ "v0-all",
+ "v0-understanding",
+ "v0-generation",
+}
+
+
+def list_suites() -> List[str]:
+ """Return all known suite names (static + dynamic)."""
+ return sorted(set(_STATIC_SUITES) | _DYNAMIC_SUITES)
+
+
+def resolve_suite(name: str, registry: BenchmarkRegistry) -> List[str]:
+ """Resolve a suite name to a sorted list of benchmark IDs.
+
+ Raises
+ ------
+ KeyError
+ If ``name`` is not a known suite.
+ """
+ if name in _STATIC_SUITES:
+ return list(_STATIC_SUITES[name])
+
+ if name == "v0-all":
+ return sorted(b.meta.id for b in registry.list())
+ if name == "v0-understanding":
+ return sorted(
+ b.meta.id for b in registry.list(task_type=TaskType.UNDERSTANDING)
+ )
+ if name == "v0-generation":
+ return sorted(
+ b.meta.id for b in registry.list(task_type=TaskType.GENERATION)
+ )
+
+ raise KeyError(
+ f"Unknown suite {name!r}. Known suites: {', '.join(list_suites())}"
+ )
+
+
+def describe_suite(name: str, registry: BenchmarkRegistry) -> Dict[str, object]:
+ """Return a metadata dict describing the suite (name, size, task IDs)."""
+ ids = resolve_suite(name, registry)
+ return {
+ "name": name,
+ "kind": "static" if name in _STATIC_SUITES else "dynamic",
+ "n_tasks": len(ids),
+ "task_ids": ids,
+ }