feat: rewrite video-translate skill with v3 API (hardened) by davidchou-heygen · Pull Request #46 · heygen-com/skills

davidchou-heygen · 2026-04-16T19:07:11Z

Supersedes #43 (with @kenchung's blessing).

Contains @kenchung's full v3 rewrite (3 commits) plus one follow-up commit from end-to-end testing. Bundling them so the entire video-translate v3 ships as one reviewable change against master.

What @kenchung's commits ship (`382b03c`, `f14e0a0`, `3199516`)

New video-translate skill built on the v3 API (POST /v3/video-translations)
Content-type routing: maps talking-head, podcast, music-video, and corporate content to optimal flag combinations instead of exposing raw boolean toggles
Precision mode default-on, first-time user detection, source quality guardrails, speaker count validation
Languages fetched from API (no hardcoded table)
Discriminated-union video field for source routing (URL / asset upload / existing asset ID)
URL sanity check (HEAD request) before submission, with local-upload fallback

What the hardening commit adds (`badc217`)

Eight fixes found by running the skill end-to-end against real translations:

Install path

Move skills/video-translate/ → video-translate/ to match heygen-avatar/ and heygen-video/ layout
Register video-translate in setup's SKILLS array

Skill content

Interactive intake — ask one question at a time, extract from first message, propose sensible defaults
API key resolution from ~/.heygen/config (mirrors parent SKILL.md's 3-source order)
Soak-then-tight polling — 5-min silent soak, then 30s polls until terminal
Backgrounded bash polling — replaces broken Agent subagent path (subagents can't inherit Bash approvals in Claude Code)
Upfront approval-count warning before launching N background polls
Narrate-before-each-Bash rule — one sentence about why before every shell call
Markdown-link-wrapped delivery — flag emoji header + view-online link + [Download .mp4] + [SRT] · [VTT]
Shell hygiene section — env vars don't persist across Bash calls (inline auth load every time), zsh "no matches found" = unquoted glob, don't burn approvals on diagnostics
Fix \$status zsh read-only collision in poll script template
Correct languages endpoint response shape: real API returns { languages: string[] }, not { data: [{ code, name }] }
Correct dashboard URL pattern: /videos/{id} not /video-translate/{id}

Test plan

Run ./setup from a fresh checkout — verify video-translate symlinks into ~/.claude/skills
Submit a translation end-to-end — verify one approval per language, no foreground polling, deliverable renders with clickable links
Verify the language list query returns successfully on first call (no jq path error)
Verify View online link opens the correct dashboard page

After merge

@kenchung to close feat: rewrite video-translate skill to opinionated v3 API #43
I'll close fix: harden video-translate from end-to-end testing #45 (the interim PR I had stacked on Ken's branch) when this lands

Replace the thin v2 API wrapper with an expert v3 skill that encodes translation best practices. Key changes: - Upgrade from v2 to v3 API (POST /v3/video-translations) - Content-type routing: map talking head, podcast, music video, and corporate content to optimal flag combinations instead of exposing raw boolean toggles - Precision mode always on (substantial quality gain, negligible cost) - First-time user detection and onboarding flow - Source quality guardrails and speaker count validation - Languages fetched from API instead of hardcoded table - Phase 2 features documented (brand voice, custom SRT, partial translation, multi-language batch) Based on planning session with David Chou (VT team lead). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- video field is a discriminated union object, not flat video_url/video_id - output_languages is an array, not singular output_language - Response returns video_translation_ids[] array - Add Step 0: input source routing (URL, local file upload via POST /v3/assets, existing asset) - Fix status response to include all fields (audio_url, caption URLs, etc.) - Fix failure response field name (failure_message, not message) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

HEAD request to verify URL is publicly accessible: check status code, content type, and file size. If inaccessible, offer the local file upload path as fallback instead of just failing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Found and fixed eight issues by running the skill end-to-end against real translations. All changes are tightly coupled — each fix came from a real failure or friction point in the same debugging session. Install path: - Move skills/video-translate/ to repo root to match heygen-avatar/ and heygen-video/ layout (setup looks for skills at root) - Register video-translate in setup's SKILLS array so it gets symlinked alongside the other two skills Skill content: - Interactive intake: ask one question at a time, extract from first message before asking, propose sensible defaults instead of a 4-question form - API key resolution from ~/.heygen/config (mirrors parent SKILL.md's 3-source order — env > config file > prompt user) - Soak-then-tight polling cadence: 5-min silent soak, then 30s polls until terminal; never "check back in 20 min" (optimizes for cache, not user wait) - Backgrounded bash polling (replaces broken Agent subagent path — subagents can't inherit Bash approvals in Claude Code so every curl in them dies on permission denial) - Upfront approval-count warning before launching N background polls - Narrate-before-each-Bash rule: one sentence about why before every shell call, especially for backgrounded scripts the user can't see - Markdown-link-wrapped delivery: flag emoji header + view-online link + [Download .mp4] + [SRT] · [VTT]; stops dumping 200-char signed URLs into the terminal - Shell hygiene section: env vars don't persist across Bash calls (inline auth load every time), zsh "no matches found" = unquoted glob, don't burn approvals on diagnostics - Fix \$status zsh read-only collision in poll script template (rename to \$st with explanatory comment) - Correct languages endpoint response shape: real API returns { languages: string[] }, not { data: [{ code, name }] } - Correct dashboard URL pattern: /videos/{id} not /video-translate/{id} Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kenchung

Review

Domain content is genuinely strong — content-type profiles, soak-then-tight polling, narrate-before-bash, ${status}-vs-st zsh fix, signed-URL markdown wrapping. Most operationally savvy of the three skills.

But this PR was authored against an older repo shape, and master changed materially in the last 24 hours (PR #77 self-containment + PR #79 feat!: eliminate root SKILL.md + PR #80 Codex .app.json). A handful of mechanical fixes are required before merge. ~30 min of work.

Must-fix

1. Skill name should be heygen-translate, not video-translate.
For consistency with siblings heygen-avatar and heygen-video. PR description even says David moved the dir to repo root "to match heygen-avatar/ and heygen-video/ layout" — same logic applies to the name itself. Rename: directory, frontmatter name:, setup SKILLS array entry, any manifest paths.

2. allowed-tools will break the skill at runtime. video-translate/SKILL.md:5 lists only mcp__heygen__*. The skill is heavily Bash-driven (curl, polling loops, jq, asset uploads). Claude Code will deny every shell call. Match siblings: Bash, WebFetch, Read, Write, mcp__heygen__*.

3. Missing version: frontmatter line with # x-release-please-version marker.
Both siblings have version: 3.1.0 # x-release-please-version at line 2. Without it, release-please will never bump this skill's version.

4. .cursor-plugin/plugin.json not updated. The skills array is still ["./heygen-avatar/", "./heygen-video/"]. Cursor users won't see the new skill until it's added: ["./heygen-avatar/", "./heygen-video/", "./heygen-translate/"].

5. release-please-config.json not updated. Add an extra-files entry for heygen-translate/SKILL.md so release-please bumps the version on conventional commits touching the skill.

Suggestions

6. Description missing "NOT for" clause + chain signals. Both siblings have a "NOT for" list (e.g., heygen-video says "NOT for: cinematic b-roll, video translation, TTS-only"). And add chain signals to heygen-avatar / heygen-video where overlap exists — translated talking-head output naturally chains to heygen-video for re-render, or heygen-avatar for voice-clone identity.

7. Missing canonical H2 sections. Both siblings have named ## Files & Paths, ## Language Awareness, ## UX Rules H2s (with the meta-content migrated in via PR #79). The substance is covered by the skill's intake/narrate/Bash-hygiene rules — just needs the canonical H2 labels + a Files & Paths table for ~/.heygen/config, signed-URL temp dirs, asset uploads.

8. Missing argument-hint: in frontmatter. Siblings have it (e.g., [topic_or_script] [--avatar avatar_id]). Add the equivalent for translate inputs.

What's good

Domain rules (soak-then-tight polling, content-type profiles, narrate-before-bash, upfront approval-count warning, backgrounded-bash for discovery) are real operational learnings.
No ../ references, no calls to deleted root files — already conforms to the post-#79 self-contained pattern in spirit.
Single 635-line SKILL.md is tight; no need to split into references/ yet.
Auth + shell hygiene sections are strong.

Verdict

Request changes. Items 1-5 are blocking (rename, runtime break, version anchor, plugin manifest, release-please config). 6-8 are nice-to-have for consistency with the post-#79 pattern. Once 1-5 land, this is a strong third skill in the stack.

* feat: add heygen-translate skill (video translation / dubbing) Adds a third skill, heygen-translate/, for translating and dubbing existing videos into 175+ languages with voice cloning and lip-sync. Built on the same independent-skill structure as heygen-avatar and heygen-video. What: - heygen-translate/SKILL.md (4-phase workflow: Discovery → Pre-flight → Submit+Poll → Deliver) with the same API Mode Detection ladder as heygen-video (OpenClaw plugin → CLI w/ HEYGEN_API_KEY → MCP → CLI fallback). All operations shown with MCP and CLI side-by-side, no raw curl. - heygen-translate/references/troubleshooting.md (errors → action map, polling patterns, harness-specific notes for Claude Code / OpenClaw / Cursor) - heygen-translate/references/language-locale-guide.md (regional variant defaults, formality registers, RTL caption collisions, tonal compression/expansion table, lip-sync ceiling per language) - heygen-translate/references/proofreads-workflow.md (the high-stakes review-edit-render path: extract SRT → glossary discipline → register fixes → upload edited SRT → final render) - heygen-translate/references/asset-routing.md (URL vs asset_id vs local upload routing, HEAD-check pattern, auth-walled URL fallbacks, 32 MB limit handling) Replaces PR #46 with the new repo structure (independent skills, no root SKILL.md, references inside the skill, validate-skills.yml self-contained checks, MCP+CLI transport not raw API). Why: - PR #46's SKILL.md frontmatter declared 'allowed-tools: mcp__heygen__*' but every example used raw curl against api.heygen.com. Mismatch fixed here by using the heygen video-translate CLI (with MCP fallthrough) per the established pattern in heygen-avatar/heygen-video. - PR #46 was authored against the pre-#79 structure (root SKILL.md + shared references/). Repo restructured 24h ago — each skill now owns its own SKILL.md and references/. This PR matches. - PR #46 lacked embedded translation expertise. This SKILL.md adds: speaker-count discipline, source-quality triage, locale-pair gotchas (formality registers in ja/ko/de/th/hi, RTL caption collisions, tonal compression for en→zh/ja/ko, regional variants for es/pt/zh), lip-sync ceiling, captions burned-in vs sidecar, audio-only as a different deliverable not a workaround, cost/time math, and a failure-mode decoder. - PR #46 used 'video-translate/' breaking the heygen-avatar/heygen-video prefix pattern. Renamed to 'heygen-translate/' for consistency in ls output and plugin manifest paths. - Adds a true proofreads workflow (extract SRT → user/agent edits → upload corrected SRT → render) — this is the missing high-stakes path that distinguishes the skill from API docs. Plumbing: - .claude-plugin/marketplace.json registers heygen:translate - .claude-plugin/plugin.json updates description + keywords - .codex-plugin/plugin.json updates description, keywords, longDescription, defaultPrompt - .cursor-plugin/plugin.json adds heygen-translate to skills array, plus keywords/tags - .github/workflows/validate-skills.yml adds heygen-translate to path filter and runs the same self-contained-bundle checks as the other two skills - release-please-config.json adds heygen-translate/SKILL.md as a release-please extra-files target so the version bumps in lockstep - README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated to reference the third skill Out of scope (followups): - platforms/nanoclaw/heygen-translate/ NanoClaw container variant - Eval scenarios for heygen-translate (mirror of R17-R23 pattern from heygen-video) - gh skill / agentskills.io spec compliance check (handled by the spec-validate-soft job already in validate-skills.yml) - Mark PR #46 as superseded once this lands Refs: PR #46 (predecessor), #79 (independent-skills restructure), #77 (gh skill install path) * docs(heygen-translate): document what the proofread CLI actually performs Per Ken's ask in #tmp-vt-skill: rewrite proofreads-workflow.md (and the Phase 3 proofread snippet in SKILL.md) against verified live behavior of the heygen video-translate proofreads commands, not assumed/inferred behavior. Verified against the live API + CLI on Apr 27 with two real proofread sessions (b84c8e8d... silent-source failure, 8ce0fba6c... Spanish Sintel-trailer success). Now documented: - Five subcommands mapped to real REST endpoints: create POST /v3/video-translations/proofreads get GET /v3/video-translations/proofreads/{id} srt get GET /v3/video-translations/proofreads/{id}/srt srt update PUT /v3/video-translations/proofreads/{id}/srt generate POST /v3/video-translations/proofreads/{id}/generate - What the engine actually does between create and completed (downloads source, runs ASR for original_srt_url, translates to srt_url, no render yet). - Real response shapes for create / get / srt get / srt update / generate with verified JSON examples and field-by-field meanings. - Real status enum: processing | completed | failed (NOT pending|running — that's the translation-render endpoint, which is a different state machine the resource graduates into after generate). - Polling cadence verified empirically: 3-5 min for SRT extraction on a 50-second source. Hard timeout 30 min for stuck sessions. - SRT format: standard SRT (UTF-8), well-formed timecodes, editable by hand or sed. - File naming: <title>_proofread.srt and <title>_proofread_original.srt. - original_srt_url is auto-populated source-language transcription, not a copy of any user-provided SRT. Useful as ground truth, never re-uploaded as target-language SRT. Critical correction: heygen asset create does NOT accept SRT files. The CLI exposes both URL and asset_id shapes for srt update, but the asset_id upload path is currently BLOCKED: {"error":{"code":"invalid_parameter", "message":"Content type not supported application/x-subrip"}} heygen asset create only accepts png/jpeg/mp4/webm/mp3/wav/pdf. Renaming .srt to .txt or .mp3 does not bypass it (server sniffs content, not extension). The asset_id route is in the request schema for forward compatibility but cannot currently be exercised through the standard upload path. Use the URL route. The reference now documents practical hosts that work (gist raw URLs, GitHub raw URLs, S3 public-read, presigned URLs >=2h, Vercel/static). Two new failure_message strings added to troubleshooting.md from real API responses: - 'Failed to download video from url, please check the url is valid or the video is public' (instant-fail on bad/auth-walled source URL) - 'Your video's audio is missing or corrupted, please try with another video' (~30s fail when source has no speech) Other documented quirks: - proofreads create returns proofread_ids (plural, one per language) plus a session-level status — per-id status comes from proofreads get. - After generate, polling shifts from proofreads get to video-translate get because the resource graduates from proofread to translation. - Captions on generate are independent of the proofread session's SRT — --captions controls whether the FINAL video burns captions in. - Proofread session TTL ~24h. Out of scope for this commit (still in followup queue): - NanoClaw platform variant - Eval scenarios for heygen-translate - File issue/PR upstream re: SRT asset upload (worth surfacing to HeyGen CLI team — the asset_id route in the schema can't be reached today) * fix(heygen-translate): auth gate, duration question, open-ended language input Three improvements from dogfooding: - Add auth verification step before Phase 1: runs `heygen auth status` in CLI mode, asks for API key and persists via `heygen auth login` if missing. One-time setup that survives across sessions. - Add duration flexibility question to Phase 1 discovery: asks whether output must match source length, explains quality tradeoff, controls `enable_dynamic_duration` flag instead of hardcoding true. - Make target language question explicitly open-ended: no picker, no pre-assigned choices. User types freely, validation in Phase 2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(heygen-translate): align dynamic_duration references with Phase 1 question SKILL.md:335 and references/language-locale-guide.md:49 both said "Always enable_dynamic_duration: true", contradicting the new Phase 1 duration flexibility question. Updated both to reference the user's choice and warn about quality degradation on high-compression pairs when fixed-length is chosen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: David Chou <david.chou@heygen.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

eve-builds · 2026-05-13T18:00:52Z

Closing as superseded by #82 (merged at 89d28b99 on 2026-05-13). The heygen-translate skill is now on master under the post-#79 independent-skill structure, with verified-against-live-API proofreads docs, full MCP+CLI side-by-side coverage, and David's auth-gate / duration-question / open-ended-language improvements from #86 incorporated. Thanks for the original draft — it kept the skill on the queue.

Ken and others added 4 commits April 14, 2026 20:26

davidchou-heygen requested review from eve-builds and kenchung as code owners April 16, 2026 19:07

davidchou-heygen mentioned this pull request Apr 16, 2026

fix: harden video-translate from end-to-end testing #45

Closed

3 tasks

kenchung requested changes Apr 27, 2026

View reviewed changes

eve-builds mentioned this pull request Apr 27, 2026

feat: add heygen-translate skill (video translation / dubbing) #82

Merged

9 tasks

eve-builds closed this May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: rewrite video-translate skill with v3 API (hardened)#46

feat: rewrite video-translate skill with v3 API (hardened)#46
davidchou-heygen wants to merge 4 commits into
heygen-com:masterfrom
davidchou-heygen:rewrite/video-translate-v3

davidchou-heygen commented Apr 16, 2026

Uh oh!

kenchung left a comment

Uh oh!

eve-builds commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

davidchou-heygen commented Apr 16, 2026

What @kenchung's commits ship (382b03c, f14e0a0, 3199516)

What the hardening commit adds (badc217)

Test plan

After merge

Uh oh!

kenchung left a comment

Choose a reason for hiding this comment

Review

Must-fix

Suggestions

What's good

Verdict

Uh oh!

eve-builds commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

What @kenchung's commits ship (`382b03c`, `f14e0a0`, `3199516`)

What the hardening commit adds (`badc217`)