feat: add heygen-translate skill (video translation / dubbing) by eve-builds · Pull Request #82 · heygen-com/skills

eve-builds · 2026-04-27T23:49:45Z

What

Adds a third skill, heygen-translate/, for translating and dubbing existing videos into 175+ languages with voice cloning and lip-sync. Independent, self-contained skill that mirrors the structure of heygen-avatar/ and heygen-video/.

This is a fresh PR against the latest master structure, replacing #46 which was authored before the #79 / #80 / #77 restructure (independent skills, no root SKILL.md, references inside the skill, MCP+CLI transport not raw API).

Why this PR over #46

Ken's review of #46 in Slack flagged:

Wrong transport. feat: rewrite video-translate skill with v3 API (hardened) #46 declared allowed-tools: mcp__heygen__* but every example used raw curl https://api.heygen.com/v3/.... This PR uses the heygen video-translate CLI (with MCP fallthrough) per the same API Mode Detection ladder as heygen-video.
No embedded translation expertise. feat: rewrite video-translate skill with v3 API (hardened) #46 was an API doc with a workflow stapled on. This PR adds: speaker-count discipline, source-quality triage, locale-pair gotchas, lip-sync ceiling, captions burned-in vs sidecar, audio-only as a different deliverable not a workaround, cost/time math, failure-mode decoder, and a full proofreads (review-edit-render) workflow.
Stale repo structure. feat: rewrite video-translate skill with v3 API (hardened) #46 placed the skill at top-level video-translate/ (breaking the heygen- prefix pattern) and was based on the pre-feat!: eliminate root SKILL.md + root references/ — skills are now independent #79 layout (shared root SKILL.md). This PR uses heygen-translate/ and matches the current independent-skill layout.

Changes

New skill

heygen-translate/
├── SKILL.md                                  # 4-phase workflow with MCP+CLI side-by-side, embedded expertise
└── references/
    ├── asset-routing.md                      # URL vs asset_id vs local-upload routing
    ├── language-locale-guide.md              # Regional variants, formality registers, RTL, tonal compression
    ├── proofreads-workflow.md                # Extract SRT → edit → upload → render (high-stakes path)
    └── troubleshooting.md                    # Errors → action map, polling, harness-specific notes

SKILL.md highlights

Same API Mode Detection ladder as heygen-video: OpenClaw plugin → CLI (HEYGEN_API_KEY override) → MCP → CLI fallback. Transport choice is silent and never narrated to the user.
Every operation has a CLI command and (when supported) an MCP tool name shown side-by-side. Zero raw curl https://api.heygen.com/... calls.
Five content profiles (talking-head, podcast, music-heavy, multi-speaker, corporate) with the right flags pre-bundled. Default is talking-head with precision mode + speech enhancement + dynamic duration + caption + format preservation.
Phase 1 Discovery rules mirror heygen-video: ask only what's missing, one or two questions per turn, in the user's language, never as a form.
Speaker count is REQUIRED for multi-speaker content — flagged as the POST /v2/talking_photo returns 404 — documented “create talking photo” endpoint not found #1 quality killer.
Proofreads path triggers automatically for: long videos, corporate content, languages the user reads natively, RTL languages, high-stakes medical/legal/educational. Otherwise opt-in.

Embedded expertise (the gap from #46)

Speaker-count discipline. Wrong count = voice swaps mid-translation. Skill always asks for non-obvious cases and never guesses.
Source-quality triage in Phase 2 before submission. Audio clarity, face visibility, burned-in captions check.
Locale-pair gotchas as a real reference (language-locale-guide.md):
- Tonal compression table (en→zh runs ~30% shorter, en→de runs longer, en→ar/he expand + RTL).
- Formality / register matrix for ja-JP (敬語), ko-KR (honorifics), de-DE (Sie/du), fr-FR (tu/vous), th-TH, hi-IN, vi-VN, id-ID.
- RTL caption collision warnings for ar/he/ur/fa.
- Regional variant policy: always-ask for Portuguese, audience-region default for Spanish, Mandarin Simplified vs Traditional vs Cantonese.
- Lip-sync ceiling per language family.
Captions: burned-in vs sidecar SRT — when to use which and why proofreads is the path for restyleable captions.
Audio-only translation framed correctly — not a quality workaround for bad lip-sync, but a different deliverable.
Cost/time math surfaced in Phase 1: source minutes × language count, plus honest 10–30 min render-time range.
Failure-mode decoder in SKILL.md and full table in references/troubleshooting.md.

Proofreads workflow (the missing high-stakes path)

The CLI exposes heygen video-translate proofreads {create, get, srt get, srt update, generate}. This skill wires that as a first-class workflow with concrete edit playbooks (brand glossary find-replace, register fixes per language, numbers/dates/units, cultural references). High-stakes content runs proofreads by default; short low-stakes content skips it.

Plumbing

.claude-plugin/marketplace.json registers /heygen:translate
.claude-plugin/plugin.json description + keywords updated
.codex-plugin/plugin.json description, keywords, longDescription, defaultPrompt
.cursor-plugin/plugin.json adds heygen-translate to skills array, keywords, tags
.github/workflows/validate-skills.yml adds heygen-translate to path filter and runs the same self-contained-bundle install checks (no parent-dir refs, all references resolve, no orphans)
release-please-config.json registers heygen-translate/SKILL.md as an extra-files target so the skill version bumps in lockstep with the others
README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated to reference the third skill

Naming decision (per Ken's question)

Used heygen-translate/ not video-translate/. Reasoning:

Matches the heygen-avatar/ / heygen-video/ prefix pattern visually in ls and in plugin manifests.
Plugin invocation is /heygen:translate regardless of folder name, so user-facing UX doesn't depend on this — but consistency matters for contributor velocity.
Cheaper to fix now (one new directory, no migration) than after merge.

If you'd prefer a different name (heygen-video-translate/, heygen-dub/, etc.) say the word and I'll rename in this branch before merge.

Testing

Local validation: grep -nE '\.\./' SKILL.md returns 0 matches
Local validation: every relative reference (references/X.md) exists in the bundle
Local validation: no orphan files in references/ (every file is linked from SKILL.md)
Local validation: no ../../ in references/ (would break inside an installed bundle)
All JSON manifests parse with jq empty
CLI confirmed: heygen video-translate --help exposes create / get / list / delete / update / languages list / proofreads (create / get / generate / srt get / srt update). Schemas captured via --request-schema.
CLI confirmed: heygen video-translate languages list returns 175+ languages including all major regional variants
Live smoke test: translate a short test clip into one language end-to-end (will run before merge if you want)
Eval scenarios for heygen-translate: out of scope for this PR; followup mirroring R17–R23 pattern from heygen-video

Out of scope (followup PRs)

platforms/nanoclaw/heygen-translate/ NanoClaw container variant
Eval framework + scenarios for heygen-translate (R-round mirror of heygen-video evals)
Mark PR feat: rewrite video-translate skill with v3 API (hardened) #46 as superseded once this lands (close with a comment pointing here)

Breaking changes

None. New skill, additive to the existing two. Existing heygen-avatar / heygen-video flows are untouched.

Refs

Predecessor: feat: rewrite video-translate skill with v3 API (hardened) #46 (close once this lands)
Independent-skills restructure: feat!: eliminate root SKILL.md + root references/ — skills are now independent #79
gh skill install path: feat: make heygen-avatar + heygen-video install cleanly via gh skill #77
Codex/Cursor manifest precedent: feat: add Codex and Cursor plugin manifests #65

Adds a third skill, heygen-translate/, for translating and dubbing existing videos into 175+ languages with voice cloning and lip-sync. Built on the same independent-skill structure as heygen-avatar and heygen-video. What: - heygen-translate/SKILL.md (4-phase workflow: Discovery → Pre-flight → Submit+Poll → Deliver) with the same API Mode Detection ladder as heygen-video (OpenClaw plugin → CLI w/ HEYGEN_API_KEY → MCP → CLI fallback). All operations shown with MCP and CLI side-by-side, no raw curl. - heygen-translate/references/troubleshooting.md (errors → action map, polling patterns, harness-specific notes for Claude Code / OpenClaw / Cursor) - heygen-translate/references/language-locale-guide.md (regional variant defaults, formality registers, RTL caption collisions, tonal compression/expansion table, lip-sync ceiling per language) - heygen-translate/references/proofreads-workflow.md (the high-stakes review-edit-render path: extract SRT → glossary discipline → register fixes → upload edited SRT → final render) - heygen-translate/references/asset-routing.md (URL vs asset_id vs local upload routing, HEAD-check pattern, auth-walled URL fallbacks, 32 MB limit handling) Replaces PR #46 with the new repo structure (independent skills, no root SKILL.md, references inside the skill, validate-skills.yml self-contained checks, MCP+CLI transport not raw API). Why: - PR #46's SKILL.md frontmatter declared 'allowed-tools: mcp__heygen__*' but every example used raw curl against api.heygen.com. Mismatch fixed here by using the heygen video-translate CLI (with MCP fallthrough) per the established pattern in heygen-avatar/heygen-video. - PR #46 was authored against the pre-#79 structure (root SKILL.md + shared references/). Repo restructured 24h ago — each skill now owns its own SKILL.md and references/. This PR matches. - PR #46 lacked embedded translation expertise. This SKILL.md adds: speaker-count discipline, source-quality triage, locale-pair gotchas (formality registers in ja/ko/de/th/hi, RTL caption collisions, tonal compression for en→zh/ja/ko, regional variants for es/pt/zh), lip-sync ceiling, captions burned-in vs sidecar, audio-only as a different deliverable not a workaround, cost/time math, and a failure-mode decoder. - PR #46 used 'video-translate/' breaking the heygen-avatar/heygen-video prefix pattern. Renamed to 'heygen-translate/' for consistency in ls output and plugin manifest paths. - Adds a true proofreads workflow (extract SRT → user/agent edits → upload corrected SRT → render) — this is the missing high-stakes path that distinguishes the skill from API docs. Plumbing: - .claude-plugin/marketplace.json registers heygen:translate - .claude-plugin/plugin.json updates description + keywords - .codex-plugin/plugin.json updates description, keywords, longDescription, defaultPrompt - .cursor-plugin/plugin.json adds heygen-translate to skills array, plus keywords/tags - .github/workflows/validate-skills.yml adds heygen-translate to path filter and runs the same self-contained-bundle checks as the other two skills - release-please-config.json adds heygen-translate/SKILL.md as a release-please extra-files target so the version bumps in lockstep - README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated to reference the third skill Out of scope (followups): - platforms/nanoclaw/heygen-translate/ NanoClaw container variant - Eval scenarios for heygen-translate (mirror of R17-R23 pattern from heygen-video) - gh skill / agentskills.io spec compliance check (handled by the spec-validate-soft job already in validate-skills.yml) - Mark PR #46 as superseded once this lands Refs: PR #46 (predecessor), #79 (independent-skills restructure), #77 (gh skill install path)

kenchung

Approve — mechanics review

Reviewed purely from a conformance/mechanics perspective per Ken's note. All 5 blockers from my PR #46 review are fixed, plus full plumbing.

Mechanical conformance — all verified at `429be2a`

Item	Status
Skill name `heygen-translate` (matches `heygen-` prefix convention)	✅
`version: 3.1.0 # x-release-please-version` frontmatter	✅ line 2
`argument-hint: "[video_url_or_path] [--to language]"`	✅ line 29
`allowed-tools: Bash, WebFetch, Read, Write, mcp__heygen__*`	✅ line 31
`metadata.openclaw.requires.env: HEYGEN_API_KEY`	✅ line 32+
Description has "Use when" (5 cases), "Returns", "Chain signal", "NOT for"	✅ all present
Self-contained references — no `../` paths	✅ all sibling-relative
`references/asset-routing.md`, `language-locale-guide.md`, `proofreads-workflow.md`, `troubleshooting.md` all present	✅ 4 files
`.cursor-plugin/plugin.json` skills array updated	✅ adds `./heygen-translate/`
`.claude-plugin/marketplace.json` registered as `/heygen:translate`	✅
`.claude-plugin/plugin.json`, `.codex-plugin/plugin.json` updated	✅
`release-please-config.json` extra-files includes `heygen-translate/SKILL.md`	✅
`validate-skills.yml` updated for new skill (path filter + install/verify)	✅ +49 lines
README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated	✅
CI: self-containment install + spec validation	✅ both SUCCESS

Minor pattern nits (non-blocking)

Missing canonical H2 sections. Siblings have named ## Files & Paths, ## Language Awareness, ## UX Rules H2s. heygen-translate organized them differently — substance is partly covered by ## API Mode Detection, ## User-Facing Behavior, ## Best Practices, ## Source Quality Disclaimer. Not a regression; just stylistic variance from the post-#79 template. Worth normalizing in a follow-up if you want strict H2 parity across all three skills.
"NOT for" clause is implicit. Embedded as a parenthetical inside the description ("not a new presenter — that's heygen-video"). Both siblings have an explicit NOT for: line. Cosmetic.

Worth calling out (good)

## Embedded Expertise section is actually richer than siblings in some places: Speaker count discipline, Source-quality triage, Locale-pair gotchas, Lip-sync ceiling, Caption styling, Audio-only workflow, Cost & time awareness, Failure-mode decoder. Hits the domain audit checklist — but per Ken's call this is for separate iteration, not blocking here.
proofreads-workflow.md (224 lines) wires up the CLI's proofreads subtree as a real workflow — the QA loop my earlier review flagged as missing.
5 reference files at sensible sizes (~6-10KB each), no inline bloat.
4 H3s under ## Embedded Expertise directly map to the audit gaps.

Approve. Domain expertise depth is acknowledged as separate scope.

…orms Per Ken's ask in #tmp-vt-skill: rewrite proofreads-workflow.md (and the Phase 3 proofread snippet in SKILL.md) against verified live behavior of the heygen video-translate proofreads commands, not assumed/inferred behavior. Verified against the live API + CLI on Apr 27 with two real proofread sessions (b84c8e8d... silent-source failure, 8ce0fba6c... Spanish Sintel-trailer success). Now documented: - Five subcommands mapped to real REST endpoints: create POST /v3/video-translations/proofreads get GET /v3/video-translations/proofreads/{id} srt get GET /v3/video-translations/proofreads/{id}/srt srt update PUT /v3/video-translations/proofreads/{id}/srt generate POST /v3/video-translations/proofreads/{id}/generate - What the engine actually does between create and completed (downloads source, runs ASR for original_srt_url, translates to srt_url, no render yet). - Real response shapes for create / get / srt get / srt update / generate with verified JSON examples and field-by-field meanings. - Real status enum: processing | completed | failed (NOT pending|running — that's the translation-render endpoint, which is a different state machine the resource graduates into after generate). - Polling cadence verified empirically: 3-5 min for SRT extraction on a 50-second source. Hard timeout 30 min for stuck sessions. - SRT format: standard SRT (UTF-8), well-formed timecodes, editable by hand or sed. - File naming: <title>_proofread.srt and <title>_proofread_original.srt. - original_srt_url is auto-populated source-language transcription, not a copy of any user-provided SRT. Useful as ground truth, never re-uploaded as target-language SRT. Critical correction: heygen asset create does NOT accept SRT files. The CLI exposes both URL and asset_id shapes for srt update, but the asset_id upload path is currently BLOCKED: {"error":{"code":"invalid_parameter", "message":"Content type not supported application/x-subrip"}} heygen asset create only accepts png/jpeg/mp4/webm/mp3/wav/pdf. Renaming .srt to .txt or .mp3 does not bypass it (server sniffs content, not extension). The asset_id route is in the request schema for forward compatibility but cannot currently be exercised through the standard upload path. Use the URL route. The reference now documents practical hosts that work (gist raw URLs, GitHub raw URLs, S3 public-read, presigned URLs >=2h, Vercel/static). Two new failure_message strings added to troubleshooting.md from real API responses: - 'Failed to download video from url, please check the url is valid or the video is public' (instant-fail on bad/auth-walled source URL) - 'Your video's audio is missing or corrupted, please try with another video' (~30s fail when source has no speech) Other documented quirks: - proofreads create returns proofread_ids (plural, one per language) plus a session-level status — per-id status comes from proofreads get. - After generate, polling shifts from proofreads get to video-translate get because the resource graduates from proofread to translation. - Captions on generate are independent of the proofread session's SRT — --captions controls whether the FINAL video burns captions in. - Proofread session TTL ~24h. Out of scope for this commit (still in followup queue): - NanoClaw platform variant - Eval scenarios for heygen-translate - File issue/PR upstream re: SRT asset upload (worth surfacing to HeyGen CLI team — the asset_id route in the schema can't be reached today)

…age input Three improvements from dogfooding: - Add auth verification step before Phase 1: runs `heygen auth status` in CLI mode, asks for API key and persists via `heygen auth login` if missing. One-time setup that survives across sessions. - Add duration flexibility question to Phase 1 discovery: asks whether output must match source length, explains quality tradeoff, controls `enable_dynamic_duration` flag instead of hardcoding true. - Make target language question explicitly open-ended: no picker, no pre-assigned choices. User types freely, validation in Phase 2. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… question SKILL.md:335 and references/language-locale-guide.md:49 both said "Always enable_dynamic_duration: true", contradicting the new Phase 1 duration flexibility question. Updated both to reference the user's choice and warn about quality degradation on high-compression pairs when fixed-length is chosen. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ovements fix(heygen-translate): auth gate, duration question, open-ended language input

eve-builds requested a review from kenchung as a code owner April 27, 2026 23:49

kenchung approved these changes Apr 27, 2026

View reviewed changes

eve-builds and others added 4 commits April 27, 2026 17:12

Merge pull request #86 from heygen-com/davidchou/translate-skill-impr…

1e9a4d5

…ovements fix(heygen-translate): auth gate, duration question, open-ended language input

davidchou-heygen approved these changes May 13, 2026

View reviewed changes

eve-builds merged commit 89d28b9 into master May 13, 2026
3 checks passed

eve-builds deleted the feat/heygen-translate branch May 13, 2026 17:58

github-actions Bot mentioned this pull request May 13, 2026

chore: release 3.2.0 #87

Merged

eve-builds mentioned this pull request May 13, 2026

feat: rewrite video-translate skill with v3 API (hardened) #46

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add heygen-translate skill (video translation / dubbing)#82

feat: add heygen-translate skill (video translation / dubbing)#82
eve-builds merged 5 commits into
masterfrom
feat/heygen-translate

eve-builds commented Apr 27, 2026

Uh oh!

kenchung left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eve-builds commented Apr 27, 2026

What

Why this PR over #46

Changes

New skill

SKILL.md highlights

Embedded expertise (the gap from #46)

Proofreads workflow (the missing high-stakes path)

Plumbing

Naming decision (per Ken's question)

Testing

Out of scope (followup PRs)

Breaking changes

Refs

Uh oh!

kenchung left a comment

Choose a reason for hiding this comment

Approve — mechanics review

Mechanical conformance — all verified at 429be2a

Minor pattern nits (non-blocking)

Worth calling out (good)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mechanical conformance — all verified at `429be2a`