Contributing Models to agent2go

Overview

The agent2go model registry is an open collection of model configurations for running AI models on GPU pods. Community contributions help expand the model catalog for everyone.

How to Contribute

Option 1: GitHub Issue (Easiest)

Run your model on an agent2go pod
Export your config: a2go registry export --format issue
Open a New Model Issue
Paste the exported config and test evidence
A maintainer will review and merge your contribution

Option 2: Direct Pull Request

Fork this repository
Create a new JSON file in registry/models/ (use an existing file as reference)
Run validation: cd site && npm run validate
Submit a Pull Request

CI will automatically validate your JSON and check that the HuggingFace repo exists.

Model Config Reference

Each model is a JSON file in registry/models/ with these fields:

Field	Required	Description
`id`	Yes	Unique ID in `provider/name` format (lowercase)
`name`	Yes	Human-readable model name
`type`	Yes	`llm`, `audio`, or `image`
`engine`	Yes	`a2go-llamacpp`, `llamacpp`, `llamacpp-audio`, `image-gen`, `mlx-lm`, `mlx-audio`, `mflux`, or `vllm`
`repo`	Yes	HuggingFace repository name
`files`	Yes	Array of files to download from the repo
`downloadDir`	Yes	Must start with `/workspace/models/`
`servedAs`	Yes (LLM)	Model name exposed via API
`vram`	Yes	Object with `model` (MB) and `overhead` (MB) fields
`kvCacheMbPer1kTokens`	Recommended	KV cache VRAM per 1k tokens (with q8_0)
`defaults`	Recommended	Default `contextLength` and `port`
`startDefaults`	Optional	Default values like `gpuLayers`, `parallel`
`extraStartArgs`	Optional	Additional CLI args for the engine
`provider`	Yes (LLM)	Provider config with `name` and `api`
`default`	Yes	Whether this is the default for its type (usually `false`)
`status`	Yes	`stable`, `experimental`, or `deprecated`
`verifiedOn`	Optional	Array of GPU IDs verified on
`verifiedContext`	Optional	Context length (tokens) used during TPS benchmarking

VRAM Estimation

VRAM values should be measured, not guessed:

Start the model on a pod
Run nvidia-smi and note VRAM usage
Set vram.model to the model weight VRAM (approximate)
Set vram.overhead to the remaining VRAM minus KV cache

KV Cache Rate

For LLM models, measure kvCacheMbPer1kTokens:

Run model with a known context length (e.g., 150k)
Note total VRAM used
Calculate: (total_vram - model_vram - overhead) / (context_length / 1000)

This value should reflect q8_0 KV quantization (the entrypoint uses -ctk q8_0 -ctv q8_0).

Validation

Before submitting, validate your config:

cd site
npm run validate
npm run validate:hf  # Also verify HF repos exist

Security Requirements

downloadDir must start with /workspace/models/ (path restriction)
engine must be one of the known engines (engine whitelist)
extraStartArgs are passed as CLI args to known binaries only (no code execution)
All merges require maintainer review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing Models to agent2go

Overview

How to Contribute

Option 1: GitHub Issue (Easiest)

Option 2: Direct Pull Request

Model Config Reference

VRAM Estimation

KV Cache Rate

Validation

Security Requirements

FilesExpand file tree

contributing-models.md

Latest commit

History

contributing-models.md

File metadata and controls

Contributing Models to agent2go

Overview

How to Contribute

Option 1: GitHub Issue (Easiest)

Option 2: Direct Pull Request

Model Config Reference

VRAM Estimation

KV Cache Rate

Validation

Security Requirements