feat: add heygen-translate skill (video translation / dubbing)#82
Conversation
Adds a third skill, heygen-translate/, for translating and dubbing existing videos into 175+ languages with voice cloning and lip-sync. Built on the same independent-skill structure as heygen-avatar and heygen-video. What: - heygen-translate/SKILL.md (4-phase workflow: Discovery → Pre-flight → Submit+Poll → Deliver) with the same API Mode Detection ladder as heygen-video (OpenClaw plugin → CLI w/ HEYGEN_API_KEY → MCP → CLI fallback). All operations shown with MCP and CLI side-by-side, no raw curl. - heygen-translate/references/troubleshooting.md (errors → action map, polling patterns, harness-specific notes for Claude Code / OpenClaw / Cursor) - heygen-translate/references/language-locale-guide.md (regional variant defaults, formality registers, RTL caption collisions, tonal compression/expansion table, lip-sync ceiling per language) - heygen-translate/references/proofreads-workflow.md (the high-stakes review-edit-render path: extract SRT → glossary discipline → register fixes → upload edited SRT → final render) - heygen-translate/references/asset-routing.md (URL vs asset_id vs local upload routing, HEAD-check pattern, auth-walled URL fallbacks, 32 MB limit handling) Replaces PR #46 with the new repo structure (independent skills, no root SKILL.md, references inside the skill, validate-skills.yml self-contained checks, MCP+CLI transport not raw API). Why: - PR #46's SKILL.md frontmatter declared 'allowed-tools: mcp__heygen__*' but every example used raw curl against api.heygen.com. Mismatch fixed here by using the heygen video-translate CLI (with MCP fallthrough) per the established pattern in heygen-avatar/heygen-video. - PR #46 was authored against the pre-#79 structure (root SKILL.md + shared references/). Repo restructured 24h ago — each skill now owns its own SKILL.md and references/. This PR matches. - PR #46 lacked embedded translation expertise. This SKILL.md adds: speaker-count discipline, source-quality triage, locale-pair gotchas (formality registers in ja/ko/de/th/hi, RTL caption collisions, tonal compression for en→zh/ja/ko, regional variants for es/pt/zh), lip-sync ceiling, captions burned-in vs sidecar, audio-only as a different deliverable not a workaround, cost/time math, and a failure-mode decoder. - PR #46 used 'video-translate/' breaking the heygen-avatar/heygen-video prefix pattern. Renamed to 'heygen-translate/' for consistency in ls output and plugin manifest paths. - Adds a true proofreads workflow (extract SRT → user/agent edits → upload corrected SRT → render) — this is the missing high-stakes path that distinguishes the skill from API docs. Plumbing: - .claude-plugin/marketplace.json registers heygen:translate - .claude-plugin/plugin.json updates description + keywords - .codex-plugin/plugin.json updates description, keywords, longDescription, defaultPrompt - .cursor-plugin/plugin.json adds heygen-translate to skills array, plus keywords/tags - .github/workflows/validate-skills.yml adds heygen-translate to path filter and runs the same self-contained-bundle checks as the other two skills - release-please-config.json adds heygen-translate/SKILL.md as a release-please extra-files target so the version bumps in lockstep - README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated to reference the third skill Out of scope (followups): - platforms/nanoclaw/heygen-translate/ NanoClaw container variant - Eval scenarios for heygen-translate (mirror of R17-R23 pattern from heygen-video) - gh skill / agentskills.io spec compliance check (handled by the spec-validate-soft job already in validate-skills.yml) - Mark PR #46 as superseded once this lands Refs: PR #46 (predecessor), #79 (independent-skills restructure), #77 (gh skill install path)
kenchung
left a comment
There was a problem hiding this comment.
Approve — mechanics review
Reviewed purely from a conformance/mechanics perspective per Ken's note. All 5 blockers from my PR #46 review are fixed, plus full plumbing.
Mechanical conformance — all verified at 429be2a
| Item | Status |
|---|---|
Skill name heygen-translate (matches heygen- prefix convention) |
✅ |
version: 3.1.0 # x-release-please-version frontmatter |
✅ line 2 |
argument-hint: "[video_url_or_path] [--to language]" |
✅ line 29 |
allowed-tools: Bash, WebFetch, Read, Write, mcp__heygen__* |
✅ line 31 |
metadata.openclaw.requires.env: HEYGEN_API_KEY |
✅ line 32+ |
| Description has "Use when" (5 cases), "Returns", "Chain signal", "NOT for" | ✅ all present |
Self-contained references — no ../ paths |
✅ all sibling-relative |
references/asset-routing.md, language-locale-guide.md, proofreads-workflow.md, troubleshooting.md all present |
✅ 4 files |
.cursor-plugin/plugin.json skills array updated |
✅ adds ./heygen-translate/ |
.claude-plugin/marketplace.json registered as /heygen:translate |
✅ |
.claude-plugin/plugin.json, .codex-plugin/plugin.json updated |
✅ |
release-please-config.json extra-files includes heygen-translate/SKILL.md |
✅ |
validate-skills.yml updated for new skill (path filter + install/verify) |
✅ +49 lines |
| README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated | ✅ |
| CI: self-containment install + spec validation | ✅ both SUCCESS |
Minor pattern nits (non-blocking)
-
Missing canonical H2 sections. Siblings have named
## Files & Paths,## Language Awareness,## UX RulesH2s. heygen-translate organized them differently — substance is partly covered by## API Mode Detection,## User-Facing Behavior,## Best Practices,## Source Quality Disclaimer. Not a regression; just stylistic variance from the post-#79 template. Worth normalizing in a follow-up if you want strict H2 parity across all three skills. -
"NOT for" clause is implicit. Embedded as a parenthetical inside the description ("not a new presenter — that's heygen-video"). Both siblings have an explicit
NOT for:line. Cosmetic.
Worth calling out (good)
## Embedded Expertisesection is actually richer than siblings in some places: Speaker count discipline, Source-quality triage, Locale-pair gotchas, Lip-sync ceiling, Caption styling, Audio-only workflow, Cost & time awareness, Failure-mode decoder. Hits the domain audit checklist — but per Ken's call this is for separate iteration, not blocking here.proofreads-workflow.md(224 lines) wires up the CLI'sproofreadssubtree as a real workflow — the QA loop my earlier review flagged as missing.- 5 reference files at sensible sizes (~6-10KB each), no inline bloat.
- 4 H3s under
## Embedded Expertisedirectly map to the audit gaps.
Approve. Domain expertise depth is acknowledged as separate scope.
…orms
Per Ken's ask in #tmp-vt-skill: rewrite proofreads-workflow.md (and the
Phase 3 proofread snippet in SKILL.md) against verified live behavior of
the heygen video-translate proofreads commands, not assumed/inferred
behavior.
Verified against the live API + CLI on Apr 27 with two real proofread
sessions (b84c8e8d... silent-source failure, 8ce0fba6c... Spanish
Sintel-trailer success).
Now documented:
- Five subcommands mapped to real REST endpoints:
create POST /v3/video-translations/proofreads
get GET /v3/video-translations/proofreads/{id}
srt get GET /v3/video-translations/proofreads/{id}/srt
srt update PUT /v3/video-translations/proofreads/{id}/srt
generate POST /v3/video-translations/proofreads/{id}/generate
- What the engine actually does between create and completed (downloads
source, runs ASR for original_srt_url, translates to srt_url, no
render yet).
- Real response shapes for create / get / srt get / srt update / generate
with verified JSON examples and field-by-field meanings.
- Real status enum: processing | completed | failed (NOT pending|running
— that's the translation-render endpoint, which is a different state
machine the resource graduates into after generate).
- Polling cadence verified empirically: 3-5 min for SRT extraction on a
50-second source. Hard timeout 30 min for stuck sessions.
- SRT format: standard SRT (UTF-8), well-formed timecodes, editable by
hand or sed.
- File naming: <title>_proofread.srt and <title>_proofread_original.srt.
- original_srt_url is auto-populated source-language transcription, not
a copy of any user-provided SRT. Useful as ground truth, never
re-uploaded as target-language SRT.
Critical correction: heygen asset create does NOT accept SRT files.
The CLI exposes both URL and asset_id shapes for srt update, but the
asset_id upload path is currently BLOCKED:
{"error":{"code":"invalid_parameter",
"message":"Content type not supported application/x-subrip"}}
heygen asset create only accepts png/jpeg/mp4/webm/mp3/wav/pdf.
Renaming .srt to .txt or .mp3 does not bypass it (server sniffs content,
not extension). The asset_id route is in the request schema for forward
compatibility but cannot currently be exercised through the standard
upload path.
Use the URL route. The reference now documents practical hosts that
work (gist raw URLs, GitHub raw URLs, S3 public-read, presigned
URLs >=2h, Vercel/static).
Two new failure_message strings added to troubleshooting.md from real
API responses:
- 'Failed to download video from url, please check the url is valid or
the video is public' (instant-fail on bad/auth-walled source URL)
- 'Your video's audio is missing or corrupted, please try with another
video' (~30s fail when source has no speech)
Other documented quirks:
- proofreads create returns proofread_ids (plural, one per language)
plus a session-level status — per-id status comes from proofreads get.
- After generate, polling shifts from proofreads get to
video-translate get because the resource graduates from proofread
to translation.
- Captions on generate are independent of the proofread session's SRT —
--captions controls whether the FINAL video burns captions in.
- Proofread session TTL ~24h.
Out of scope for this commit (still in followup queue):
- NanoClaw platform variant
- Eval scenarios for heygen-translate
- File issue/PR upstream re: SRT asset upload (worth surfacing to HeyGen
CLI team — the asset_id route in the schema can't be reached today)
…age input Three improvements from dogfooding: - Add auth verification step before Phase 1: runs `heygen auth status` in CLI mode, asks for API key and persists via `heygen auth login` if missing. One-time setup that survives across sessions. - Add duration flexibility question to Phase 1 discovery: asks whether output must match source length, explains quality tradeoff, controls `enable_dynamic_duration` flag instead of hardcoding true. - Make target language question explicitly open-ended: no picker, no pre-assigned choices. User types freely, validation in Phase 2. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… question SKILL.md:335 and references/language-locale-guide.md:49 both said "Always enable_dynamic_duration: true", contradicting the new Phase 1 duration flexibility question. Updated both to reference the user's choice and warn about quality degradation on high-compression pairs when fixed-length is chosen. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ovements fix(heygen-translate): auth gate, duration question, open-ended language input
What
Adds a third skill,
heygen-translate/, for translating and dubbing existing videos into 175+ languages with voice cloning and lip-sync. Independent, self-contained skill that mirrors the structure ofheygen-avatar/andheygen-video/.This is a fresh PR against the latest
masterstructure, replacing #46 which was authored before the #79 / #80 / #77 restructure (independent skills, no root SKILL.md, references inside the skill, MCP+CLI transport not raw API).Why this PR over #46
Ken's review of #46 in Slack flagged:
allowed-tools: mcp__heygen__*but every example used rawcurl https://api.heygen.com/v3/.... This PR uses theheygen video-translateCLI (with MCP fallthrough) per the same API Mode Detection ladder asheygen-video.video-translate/(breaking theheygen-prefix pattern) and was based on the pre-feat!: eliminate root SKILL.md + root references/ — skills are now independent #79 layout (shared root SKILL.md). This PR usesheygen-translate/and matches the current independent-skill layout.Changes
New skill
SKILL.md highlights
curl https://api.heygen.com/...calls.Embedded expertise (the gap from #46)
language-locale-guide.md):references/troubleshooting.md.Proofreads workflow (the missing high-stakes path)
The CLI exposes
heygen video-translate proofreads {create, get, srt get, srt update, generate}. This skill wires that as a first-class workflow with concrete edit playbooks (brand glossary find-replace, register fixes per language, numbers/dates/units, cultural references). High-stakes content runs proofreads by default; short low-stakes content skips it.Plumbing
.claude-plugin/marketplace.jsonregisters/heygen:translate.claude-plugin/plugin.jsondescription + keywords updated.codex-plugin/plugin.jsondescription, keywords, longDescription, defaultPrompt.cursor-plugin/plugin.jsonaddsheygen-translateto skills array, keywords, tags.github/workflows/validate-skills.ymladds heygen-translate to path filter and runs the same self-contained-bundle install checks (no parent-dir refs, all references resolve, no orphans)release-please-config.jsonregistersheygen-translate/SKILL.mdas an extra-files target so the skill version bumps in lockstep with the othersREADME.md,INSTALL.md,INSTALL_FOR_AGENTS.md,CLAUDE.md,CONTRIBUTING.mdall updated to reference the third skillNaming decision (per Ken's question)
Used
heygen-translate/notvideo-translate/. Reasoning:heygen-avatar//heygen-video/prefix pattern visually inlsand in plugin manifests./heygen:translateregardless of folder name, so user-facing UX doesn't depend on this — but consistency matters for contributor velocity.If you'd prefer a different name (
heygen-video-translate/,heygen-dub/, etc.) say the word and I'll rename in this branch before merge.Testing
grep -nE '\.\./' SKILL.mdreturns 0 matchesreferences/X.md) exists in the bundlereferences/(every file is linked from SKILL.md)../../in references/ (would break inside an installed bundle)jq emptyheygen video-translate --helpexposes create / get / list / delete / update / languages list / proofreads (create / get / generate / srt get / srt update). Schemas captured via--request-schema.heygen video-translate languages listreturns 175+ languages including all major regional variantsOut of scope (followup PRs)
platforms/nanoclaw/heygen-translate/NanoClaw container variantBreaking changes
None. New skill, additive to the existing two. Existing heygen-avatar / heygen-video flows are untouched.
Refs