Skip to content

feat: add heygen-translate skill (video translation / dubbing)#82

Merged
eve-builds merged 5 commits into
masterfrom
feat/heygen-translate
May 13, 2026
Merged

feat: add heygen-translate skill (video translation / dubbing)#82
eve-builds merged 5 commits into
masterfrom
feat/heygen-translate

Conversation

@eve-builds
Copy link
Copy Markdown
Collaborator

What

Adds a third skill, heygen-translate/, for translating and dubbing existing videos into 175+ languages with voice cloning and lip-sync. Independent, self-contained skill that mirrors the structure of heygen-avatar/ and heygen-video/.

This is a fresh PR against the latest master structure, replacing #46 which was authored before the #79 / #80 / #77 restructure (independent skills, no root SKILL.md, references inside the skill, MCP+CLI transport not raw API).

Why this PR over #46

Ken's review of #46 in Slack flagged:

  1. Wrong transport. feat: rewrite video-translate skill with v3 API (hardened) #46 declared allowed-tools: mcp__heygen__* but every example used raw curl https://api.heygen.com/v3/.... This PR uses the heygen video-translate CLI (with MCP fallthrough) per the same API Mode Detection ladder as heygen-video.
  2. No embedded translation expertise. feat: rewrite video-translate skill with v3 API (hardened) #46 was an API doc with a workflow stapled on. This PR adds: speaker-count discipline, source-quality triage, locale-pair gotchas, lip-sync ceiling, captions burned-in vs sidecar, audio-only as a different deliverable not a workaround, cost/time math, failure-mode decoder, and a full proofreads (review-edit-render) workflow.
  3. Stale repo structure. feat: rewrite video-translate skill with v3 API (hardened) #46 placed the skill at top-level video-translate/ (breaking the heygen- prefix pattern) and was based on the pre-feat!: eliminate root SKILL.md + root references/ — skills are now independent #79 layout (shared root SKILL.md). This PR uses heygen-translate/ and matches the current independent-skill layout.

Changes

New skill

heygen-translate/
├── SKILL.md                                  # 4-phase workflow with MCP+CLI side-by-side, embedded expertise
└── references/
    ├── asset-routing.md                      # URL vs asset_id vs local-upload routing
    ├── language-locale-guide.md              # Regional variants, formality registers, RTL, tonal compression
    ├── proofreads-workflow.md                # Extract SRT → edit → upload → render (high-stakes path)
    └── troubleshooting.md                    # Errors → action map, polling, harness-specific notes

SKILL.md highlights

  • Same API Mode Detection ladder as heygen-video: OpenClaw plugin → CLI (HEYGEN_API_KEY override) → MCP → CLI fallback. Transport choice is silent and never narrated to the user.
  • Every operation has a CLI command and (when supported) an MCP tool name shown side-by-side. Zero raw curl https://api.heygen.com/... calls.
  • Five content profiles (talking-head, podcast, music-heavy, multi-speaker, corporate) with the right flags pre-bundled. Default is talking-head with precision mode + speech enhancement + dynamic duration + caption + format preservation.
  • Phase 1 Discovery rules mirror heygen-video: ask only what's missing, one or two questions per turn, in the user's language, never as a form.
  • Speaker count is REQUIRED for multi-speaker content — flagged as the POST /v2/talking_photo returns 404 — documented “create talking photo” endpoint not found #1 quality killer.
  • Proofreads path triggers automatically for: long videos, corporate content, languages the user reads natively, RTL languages, high-stakes medical/legal/educational. Otherwise opt-in.

Embedded expertise (the gap from #46)

  • Speaker-count discipline. Wrong count = voice swaps mid-translation. Skill always asks for non-obvious cases and never guesses.
  • Source-quality triage in Phase 2 before submission. Audio clarity, face visibility, burned-in captions check.
  • Locale-pair gotchas as a real reference (language-locale-guide.md):
    • Tonal compression table (en→zh runs ~30% shorter, en→de runs longer, en→ar/he expand + RTL).
    • Formality / register matrix for ja-JP (敬語), ko-KR (honorifics), de-DE (Sie/du), fr-FR (tu/vous), th-TH, hi-IN, vi-VN, id-ID.
    • RTL caption collision warnings for ar/he/ur/fa.
    • Regional variant policy: always-ask for Portuguese, audience-region default for Spanish, Mandarin Simplified vs Traditional vs Cantonese.
    • Lip-sync ceiling per language family.
  • Captions: burned-in vs sidecar SRT — when to use which and why proofreads is the path for restyleable captions.
  • Audio-only translation framed correctly — not a quality workaround for bad lip-sync, but a different deliverable.
  • Cost/time math surfaced in Phase 1: source minutes × language count, plus honest 10–30 min render-time range.
  • Failure-mode decoder in SKILL.md and full table in references/troubleshooting.md.

Proofreads workflow (the missing high-stakes path)

The CLI exposes heygen video-translate proofreads {create, get, srt get, srt update, generate}. This skill wires that as a first-class workflow with concrete edit playbooks (brand glossary find-replace, register fixes per language, numbers/dates/units, cultural references). High-stakes content runs proofreads by default; short low-stakes content skips it.

Plumbing

  • .claude-plugin/marketplace.json registers /heygen:translate
  • .claude-plugin/plugin.json description + keywords updated
  • .codex-plugin/plugin.json description, keywords, longDescription, defaultPrompt
  • .cursor-plugin/plugin.json adds heygen-translate to skills array, keywords, tags
  • .github/workflows/validate-skills.yml adds heygen-translate to path filter and runs the same self-contained-bundle install checks (no parent-dir refs, all references resolve, no orphans)
  • release-please-config.json registers heygen-translate/SKILL.md as an extra-files target so the skill version bumps in lockstep with the others
  • README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated to reference the third skill

Naming decision (per Ken's question)

Used heygen-translate/ not video-translate/. Reasoning:

  • Matches the heygen-avatar/ / heygen-video/ prefix pattern visually in ls and in plugin manifests.
  • Plugin invocation is /heygen:translate regardless of folder name, so user-facing UX doesn't depend on this — but consistency matters for contributor velocity.
  • Cheaper to fix now (one new directory, no migration) than after merge.

If you'd prefer a different name (heygen-video-translate/, heygen-dub/, etc.) say the word and I'll rename in this branch before merge.

Testing

  • Local validation: grep -nE '\.\./' SKILL.md returns 0 matches
  • Local validation: every relative reference (references/X.md) exists in the bundle
  • Local validation: no orphan files in references/ (every file is linked from SKILL.md)
  • Local validation: no ../../ in references/ (would break inside an installed bundle)
  • All JSON manifests parse with jq empty
  • CLI confirmed: heygen video-translate --help exposes create / get / list / delete / update / languages list / proofreads (create / get / generate / srt get / srt update). Schemas captured via --request-schema.
  • CLI confirmed: heygen video-translate languages list returns 175+ languages including all major regional variants
  • Live smoke test: translate a short test clip into one language end-to-end (will run before merge if you want)
  • Eval scenarios for heygen-translate: out of scope for this PR; followup mirroring R17–R23 pattern from heygen-video

Out of scope (followup PRs)

Breaking changes

None. New skill, additive to the existing two. Existing heygen-avatar / heygen-video flows are untouched.

Refs

Adds a third skill, heygen-translate/, for translating and dubbing existing
videos into 175+ languages with voice cloning and lip-sync. Built on the
same independent-skill structure as heygen-avatar and heygen-video.

What:
- heygen-translate/SKILL.md (4-phase workflow: Discovery → Pre-flight →
  Submit+Poll → Deliver) with the same API Mode Detection ladder as
  heygen-video (OpenClaw plugin → CLI w/ HEYGEN_API_KEY → MCP → CLI
  fallback). All operations shown with MCP and CLI side-by-side, no raw
  curl.
- heygen-translate/references/troubleshooting.md (errors → action map,
  polling patterns, harness-specific notes for Claude Code / OpenClaw /
  Cursor)
- heygen-translate/references/language-locale-guide.md (regional variant
  defaults, formality registers, RTL caption collisions, tonal
  compression/expansion table, lip-sync ceiling per language)
- heygen-translate/references/proofreads-workflow.md (the high-stakes
  review-edit-render path: extract SRT → glossary discipline → register
  fixes → upload edited SRT → final render)
- heygen-translate/references/asset-routing.md (URL vs asset_id vs local
  upload routing, HEAD-check pattern, auth-walled URL fallbacks, 32 MB
  limit handling)

Replaces PR #46 with the new repo structure (independent skills, no root
SKILL.md, references inside the skill, validate-skills.yml self-contained
checks, MCP+CLI transport not raw API).

Why:
- PR #46's SKILL.md frontmatter declared 'allowed-tools: mcp__heygen__*'
  but every example used raw curl against api.heygen.com. Mismatch fixed
  here by using the heygen video-translate CLI (with MCP fallthrough)
  per the established pattern in heygen-avatar/heygen-video.
- PR #46 was authored against the pre-#79 structure (root SKILL.md +
  shared references/). Repo restructured 24h ago — each skill now owns
  its own SKILL.md and references/. This PR matches.
- PR #46 lacked embedded translation expertise. This SKILL.md adds:
  speaker-count discipline, source-quality triage, locale-pair gotchas
  (formality registers in ja/ko/de/th/hi, RTL caption collisions,
  tonal compression for en→zh/ja/ko, regional variants for es/pt/zh),
  lip-sync ceiling, captions burned-in vs sidecar, audio-only as a
  different deliverable not a workaround, cost/time math, and a
  failure-mode decoder.
- PR #46 used 'video-translate/' breaking the heygen-avatar/heygen-video
  prefix pattern. Renamed to 'heygen-translate/' for consistency in ls
  output and plugin manifest paths.
- Adds a true proofreads workflow (extract SRT → user/agent edits →
  upload corrected SRT → render) — this is the missing high-stakes path
  that distinguishes the skill from API docs.

Plumbing:
- .claude-plugin/marketplace.json registers heygen:translate
- .claude-plugin/plugin.json updates description + keywords
- .codex-plugin/plugin.json updates description, keywords, longDescription,
  defaultPrompt
- .cursor-plugin/plugin.json adds heygen-translate to skills array, plus
  keywords/tags
- .github/workflows/validate-skills.yml adds heygen-translate to path
  filter and runs the same self-contained-bundle checks as the other two
  skills
- release-please-config.json adds heygen-translate/SKILL.md as a
  release-please extra-files target so the version bumps in lockstep
- README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md
  all updated to reference the third skill

Out of scope (followups):
- platforms/nanoclaw/heygen-translate/ NanoClaw container variant
- Eval scenarios for heygen-translate (mirror of R17-R23 pattern from
  heygen-video)
- gh skill / agentskills.io spec compliance check (handled by the
  spec-validate-soft job already in validate-skills.yml)
- Mark PR #46 as superseded once this lands

Refs: PR #46 (predecessor), #79 (independent-skills restructure), #77
(gh skill install path)
@eve-builds eve-builds requested a review from kenchung as a code owner April 27, 2026 23:49
Copy link
Copy Markdown
Contributor

@kenchung kenchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve — mechanics review

Reviewed purely from a conformance/mechanics perspective per Ken's note. All 5 blockers from my PR #46 review are fixed, plus full plumbing.

Mechanical conformance — all verified at 429be2a

Item Status
Skill name heygen-translate (matches heygen- prefix convention)
version: 3.1.0 # x-release-please-version frontmatter ✅ line 2
argument-hint: "[video_url_or_path] [--to language]" ✅ line 29
allowed-tools: Bash, WebFetch, Read, Write, mcp__heygen__* ✅ line 31
metadata.openclaw.requires.env: HEYGEN_API_KEY ✅ line 32+
Description has "Use when" (5 cases), "Returns", "Chain signal", "NOT for" ✅ all present
Self-contained references — no ../ paths ✅ all sibling-relative
references/asset-routing.md, language-locale-guide.md, proofreads-workflow.md, troubleshooting.md all present ✅ 4 files
.cursor-plugin/plugin.json skills array updated ✅ adds ./heygen-translate/
.claude-plugin/marketplace.json registered as /heygen:translate
.claude-plugin/plugin.json, .codex-plugin/plugin.json updated
release-please-config.json extra-files includes heygen-translate/SKILL.md
validate-skills.yml updated for new skill (path filter + install/verify) ✅ +49 lines
README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated
CI: self-containment install + spec validation ✅ both SUCCESS

Minor pattern nits (non-blocking)

  1. Missing canonical H2 sections. Siblings have named ## Files & Paths, ## Language Awareness, ## UX Rules H2s. heygen-translate organized them differently — substance is partly covered by ## API Mode Detection, ## User-Facing Behavior, ## Best Practices, ## Source Quality Disclaimer. Not a regression; just stylistic variance from the post-#79 template. Worth normalizing in a follow-up if you want strict H2 parity across all three skills.

  2. "NOT for" clause is implicit. Embedded as a parenthetical inside the description ("not a new presenter — that's heygen-video"). Both siblings have an explicit NOT for: line. Cosmetic.

Worth calling out (good)

  • ## Embedded Expertise section is actually richer than siblings in some places: Speaker count discipline, Source-quality triage, Locale-pair gotchas, Lip-sync ceiling, Caption styling, Audio-only workflow, Cost & time awareness, Failure-mode decoder. Hits the domain audit checklist — but per Ken's call this is for separate iteration, not blocking here.
  • proofreads-workflow.md (224 lines) wires up the CLI's proofreads subtree as a real workflow — the QA loop my earlier review flagged as missing.
  • 5 reference files at sensible sizes (~6-10KB each), no inline bloat.
  • 4 H3s under ## Embedded Expertise directly map to the audit gaps.

Approve. Domain expertise depth is acknowledged as separate scope.

eve-builds and others added 4 commits April 27, 2026 17:12
…orms

Per Ken's ask in #tmp-vt-skill: rewrite proofreads-workflow.md (and the
Phase 3 proofread snippet in SKILL.md) against verified live behavior of
the heygen video-translate proofreads commands, not assumed/inferred
behavior.

Verified against the live API + CLI on Apr 27 with two real proofread
sessions (b84c8e8d... silent-source failure, 8ce0fba6c... Spanish
Sintel-trailer success).

Now documented:

- Five subcommands mapped to real REST endpoints:
    create     POST /v3/video-translations/proofreads
    get        GET  /v3/video-translations/proofreads/{id}
    srt get    GET  /v3/video-translations/proofreads/{id}/srt
    srt update PUT  /v3/video-translations/proofreads/{id}/srt
    generate   POST /v3/video-translations/proofreads/{id}/generate
- What the engine actually does between create and completed (downloads
  source, runs ASR for original_srt_url, translates to srt_url, no
  render yet).
- Real response shapes for create / get / srt get / srt update / generate
  with verified JSON examples and field-by-field meanings.
- Real status enum: processing | completed | failed (NOT pending|running
  — that's the translation-render endpoint, which is a different state
  machine the resource graduates into after generate).
- Polling cadence verified empirically: 3-5 min for SRT extraction on a
  50-second source. Hard timeout 30 min for stuck sessions.
- SRT format: standard SRT (UTF-8), well-formed timecodes, editable by
  hand or sed.
- File naming: <title>_proofread.srt and <title>_proofread_original.srt.
- original_srt_url is auto-populated source-language transcription, not
  a copy of any user-provided SRT. Useful as ground truth, never
  re-uploaded as target-language SRT.

Critical correction: heygen asset create does NOT accept SRT files.

The CLI exposes both URL and asset_id shapes for srt update, but the
asset_id upload path is currently BLOCKED:

    {"error":{"code":"invalid_parameter",
              "message":"Content type not supported application/x-subrip"}}

heygen asset create only accepts png/jpeg/mp4/webm/mp3/wav/pdf.
Renaming .srt to .txt or .mp3 does not bypass it (server sniffs content,
not extension). The asset_id route is in the request schema for forward
compatibility but cannot currently be exercised through the standard
upload path.

Use the URL route. The reference now documents practical hosts that
work (gist raw URLs, GitHub raw URLs, S3 public-read, presigned
URLs >=2h, Vercel/static).

Two new failure_message strings added to troubleshooting.md from real
API responses:
- 'Failed to download video from url, please check the url is valid or
   the video is public' (instant-fail on bad/auth-walled source URL)
- 'Your video's audio is missing or corrupted, please try with another
   video' (~30s fail when source has no speech)

Other documented quirks:
- proofreads create returns proofread_ids (plural, one per language)
  plus a session-level status — per-id status comes from proofreads get.
- After generate, polling shifts from proofreads get to
  video-translate get because the resource graduates from proofread
  to translation.
- Captions on generate are independent of the proofread session's SRT —
  --captions controls whether the FINAL video burns captions in.
- Proofread session TTL ~24h.

Out of scope for this commit (still in followup queue):
- NanoClaw platform variant
- Eval scenarios for heygen-translate
- File issue/PR upstream re: SRT asset upload (worth surfacing to HeyGen
  CLI team — the asset_id route in the schema can't be reached today)
…age input

Three improvements from dogfooding:

- Add auth verification step before Phase 1: runs `heygen auth status` in CLI
  mode, asks for API key and persists via `heygen auth login` if missing.
  One-time setup that survives across sessions.
- Add duration flexibility question to Phase 1 discovery: asks whether output
  must match source length, explains quality tradeoff, controls
  `enable_dynamic_duration` flag instead of hardcoding true.
- Make target language question explicitly open-ended: no picker, no
  pre-assigned choices. User types freely, validation in Phase 2.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… question

SKILL.md:335 and references/language-locale-guide.md:49 both said "Always
enable_dynamic_duration: true", contradicting the new Phase 1 duration
flexibility question. Updated both to reference the user's choice and warn
about quality degradation on high-compression pairs when fixed-length is chosen.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ovements

fix(heygen-translate): auth gate, duration question, open-ended language input
@eve-builds eve-builds merged commit 89d28b9 into master May 13, 2026
3 checks passed
@eve-builds eve-builds deleted the feat/heygen-translate branch May 13, 2026 17:58
@github-actions github-actions Bot mentioned this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants