feat: rewrite video-translate skill with v3 API (hardened)#46
feat: rewrite video-translate skill with v3 API (hardened)#46davidchou-heygen wants to merge 4 commits into
Conversation
Replace the thin v2 API wrapper with an expert v3 skill that encodes translation best practices. Key changes: - Upgrade from v2 to v3 API (POST /v3/video-translations) - Content-type routing: map talking head, podcast, music video, and corporate content to optimal flag combinations instead of exposing raw boolean toggles - Precision mode always on (substantial quality gain, negligible cost) - First-time user detection and onboarding flow - Source quality guardrails and speaker count validation - Languages fetched from API instead of hardcoded table - Phase 2 features documented (brand voice, custom SRT, partial translation, multi-language batch) Based on planning session with David Chou (VT team lead). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- video field is a discriminated union object, not flat video_url/video_id - output_languages is an array, not singular output_language - Response returns video_translation_ids[] array - Add Step 0: input source routing (URL, local file upload via POST /v3/assets, existing asset) - Fix status response to include all fields (audio_url, caption URLs, etc.) - Fix failure response field name (failure_message, not message) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HEAD request to verify URL is publicly accessible: check status code, content type, and file size. If inaccessible, offer the local file upload path as fallback instead of just failing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Found and fixed eight issues by running the skill end-to-end against
real translations. All changes are tightly coupled — each fix came
from a real failure or friction point in the same debugging session.
Install path:
- Move skills/video-translate/ to repo root to match heygen-avatar/
and heygen-video/ layout (setup looks for skills at root)
- Register video-translate in setup's SKILLS array so it gets
symlinked alongside the other two skills
Skill content:
- Interactive intake: ask one question at a time, extract from first
message before asking, propose sensible defaults instead of a
4-question form
- API key resolution from ~/.heygen/config (mirrors parent SKILL.md's
3-source order — env > config file > prompt user)
- Soak-then-tight polling cadence: 5-min silent soak, then 30s polls
until terminal; never "check back in 20 min" (optimizes for cache,
not user wait)
- Backgrounded bash polling (replaces broken Agent subagent path —
subagents can't inherit Bash approvals in Claude Code so every
curl in them dies on permission denial)
- Upfront approval-count warning before launching N background polls
- Narrate-before-each-Bash rule: one sentence about why before every
shell call, especially for backgrounded scripts the user can't see
- Markdown-link-wrapped delivery: flag emoji header + view-online
link + [Download .mp4] + [SRT] · [VTT]; stops dumping 200-char
signed URLs into the terminal
- Shell hygiene section: env vars don't persist across Bash calls
(inline auth load every time), zsh "no matches found" = unquoted
glob, don't burn approvals on diagnostics
- Fix \$status zsh read-only collision in poll script template
(rename to \$st with explanatory comment)
- Correct languages endpoint response shape: real API returns
{ languages: string[] }, not { data: [{ code, name }] }
- Correct dashboard URL pattern: /videos/{id} not /video-translate/{id}
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
kenchung
left a comment
There was a problem hiding this comment.
Review
Domain content is genuinely strong — content-type profiles, soak-then-tight polling, narrate-before-bash, ${status}-vs-st zsh fix, signed-URL markdown wrapping. Most operationally savvy of the three skills.
But this PR was authored against an older repo shape, and master changed materially in the last 24 hours (PR #77 self-containment + PR #79 feat!: eliminate root SKILL.md + PR #80 Codex .app.json). A handful of mechanical fixes are required before merge. ~30 min of work.
Must-fix
1. Skill name should be heygen-translate, not video-translate.
For consistency with siblings heygen-avatar and heygen-video. PR description even says David moved the dir to repo root "to match heygen-avatar/ and heygen-video/ layout" — same logic applies to the name itself. Rename: directory, frontmatter name:, setup SKILLS array entry, any manifest paths.
2. allowed-tools will break the skill at runtime. video-translate/SKILL.md:5 lists only mcp__heygen__*. The skill is heavily Bash-driven (curl, polling loops, jq, asset uploads). Claude Code will deny every shell call. Match siblings: Bash, WebFetch, Read, Write, mcp__heygen__*.
3. Missing version: frontmatter line with # x-release-please-version marker.
Both siblings have version: 3.1.0 # x-release-please-version at line 2. Without it, release-please will never bump this skill's version.
4. .cursor-plugin/plugin.json not updated. The skills array is still ["./heygen-avatar/", "./heygen-video/"]. Cursor users won't see the new skill until it's added: ["./heygen-avatar/", "./heygen-video/", "./heygen-translate/"].
5. release-please-config.json not updated. Add an extra-files entry for heygen-translate/SKILL.md so release-please bumps the version on conventional commits touching the skill.
Suggestions
6. Description missing "NOT for" clause + chain signals. Both siblings have a "NOT for" list (e.g., heygen-video says "NOT for: cinematic b-roll, video translation, TTS-only"). And add chain signals to heygen-avatar / heygen-video where overlap exists — translated talking-head output naturally chains to heygen-video for re-render, or heygen-avatar for voice-clone identity.
7. Missing canonical H2 sections. Both siblings have named ## Files & Paths, ## Language Awareness, ## UX Rules H2s (with the meta-content migrated in via PR #79). The substance is covered by the skill's intake/narrate/Bash-hygiene rules — just needs the canonical H2 labels + a Files & Paths table for ~/.heygen/config, signed-URL temp dirs, asset uploads.
8. Missing argument-hint: in frontmatter. Siblings have it (e.g., [topic_or_script] [--avatar avatar_id]). Add the equivalent for translate inputs.
What's good
- Domain rules (soak-then-tight polling, content-type profiles, narrate-before-bash, upfront approval-count warning, backgrounded-bash for discovery) are real operational learnings.
- No
../references, no calls to deleted root files — already conforms to the post-#79 self-contained pattern in spirit. - Single 635-line SKILL.md is tight; no need to split into
references/yet. - Auth + shell hygiene sections are strong.
Verdict
Request changes. Items 1-5 are blocking (rename, runtime break, version anchor, plugin manifest, release-please config). 6-8 are nice-to-have for consistency with the post-#79 pattern. Once 1-5 land, this is a strong third skill in the stack.
* feat: add heygen-translate skill (video translation / dubbing) Adds a third skill, heygen-translate/, for translating and dubbing existing videos into 175+ languages with voice cloning and lip-sync. Built on the same independent-skill structure as heygen-avatar and heygen-video. What: - heygen-translate/SKILL.md (4-phase workflow: Discovery → Pre-flight → Submit+Poll → Deliver) with the same API Mode Detection ladder as heygen-video (OpenClaw plugin → CLI w/ HEYGEN_API_KEY → MCP → CLI fallback). All operations shown with MCP and CLI side-by-side, no raw curl. - heygen-translate/references/troubleshooting.md (errors → action map, polling patterns, harness-specific notes for Claude Code / OpenClaw / Cursor) - heygen-translate/references/language-locale-guide.md (regional variant defaults, formality registers, RTL caption collisions, tonal compression/expansion table, lip-sync ceiling per language) - heygen-translate/references/proofreads-workflow.md (the high-stakes review-edit-render path: extract SRT → glossary discipline → register fixes → upload edited SRT → final render) - heygen-translate/references/asset-routing.md (URL vs asset_id vs local upload routing, HEAD-check pattern, auth-walled URL fallbacks, 32 MB limit handling) Replaces PR #46 with the new repo structure (independent skills, no root SKILL.md, references inside the skill, validate-skills.yml self-contained checks, MCP+CLI transport not raw API). Why: - PR #46's SKILL.md frontmatter declared 'allowed-tools: mcp__heygen__*' but every example used raw curl against api.heygen.com. Mismatch fixed here by using the heygen video-translate CLI (with MCP fallthrough) per the established pattern in heygen-avatar/heygen-video. - PR #46 was authored against the pre-#79 structure (root SKILL.md + shared references/). Repo restructured 24h ago — each skill now owns its own SKILL.md and references/. This PR matches. - PR #46 lacked embedded translation expertise. This SKILL.md adds: speaker-count discipline, source-quality triage, locale-pair gotchas (formality registers in ja/ko/de/th/hi, RTL caption collisions, tonal compression for en→zh/ja/ko, regional variants for es/pt/zh), lip-sync ceiling, captions burned-in vs sidecar, audio-only as a different deliverable not a workaround, cost/time math, and a failure-mode decoder. - PR #46 used 'video-translate/' breaking the heygen-avatar/heygen-video prefix pattern. Renamed to 'heygen-translate/' for consistency in ls output and plugin manifest paths. - Adds a true proofreads workflow (extract SRT → user/agent edits → upload corrected SRT → render) — this is the missing high-stakes path that distinguishes the skill from API docs. Plumbing: - .claude-plugin/marketplace.json registers heygen:translate - .claude-plugin/plugin.json updates description + keywords - .codex-plugin/plugin.json updates description, keywords, longDescription, defaultPrompt - .cursor-plugin/plugin.json adds heygen-translate to skills array, plus keywords/tags - .github/workflows/validate-skills.yml adds heygen-translate to path filter and runs the same self-contained-bundle checks as the other two skills - release-please-config.json adds heygen-translate/SKILL.md as a release-please extra-files target so the version bumps in lockstep - README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md all updated to reference the third skill Out of scope (followups): - platforms/nanoclaw/heygen-translate/ NanoClaw container variant - Eval scenarios for heygen-translate (mirror of R17-R23 pattern from heygen-video) - gh skill / agentskills.io spec compliance check (handled by the spec-validate-soft job already in validate-skills.yml) - Mark PR #46 as superseded once this lands Refs: PR #46 (predecessor), #79 (independent-skills restructure), #77 (gh skill install path) * docs(heygen-translate): document what the proofread CLI actually performs Per Ken's ask in #tmp-vt-skill: rewrite proofreads-workflow.md (and the Phase 3 proofread snippet in SKILL.md) against verified live behavior of the heygen video-translate proofreads commands, not assumed/inferred behavior. Verified against the live API + CLI on Apr 27 with two real proofread sessions (b84c8e8d... silent-source failure, 8ce0fba6c... Spanish Sintel-trailer success). Now documented: - Five subcommands mapped to real REST endpoints: create POST /v3/video-translations/proofreads get GET /v3/video-translations/proofreads/{id} srt get GET /v3/video-translations/proofreads/{id}/srt srt update PUT /v3/video-translations/proofreads/{id}/srt generate POST /v3/video-translations/proofreads/{id}/generate - What the engine actually does between create and completed (downloads source, runs ASR for original_srt_url, translates to srt_url, no render yet). - Real response shapes for create / get / srt get / srt update / generate with verified JSON examples and field-by-field meanings. - Real status enum: processing | completed | failed (NOT pending|running — that's the translation-render endpoint, which is a different state machine the resource graduates into after generate). - Polling cadence verified empirically: 3-5 min for SRT extraction on a 50-second source. Hard timeout 30 min for stuck sessions. - SRT format: standard SRT (UTF-8), well-formed timecodes, editable by hand or sed. - File naming: <title>_proofread.srt and <title>_proofread_original.srt. - original_srt_url is auto-populated source-language transcription, not a copy of any user-provided SRT. Useful as ground truth, never re-uploaded as target-language SRT. Critical correction: heygen asset create does NOT accept SRT files. The CLI exposes both URL and asset_id shapes for srt update, but the asset_id upload path is currently BLOCKED: {"error":{"code":"invalid_parameter", "message":"Content type not supported application/x-subrip"}} heygen asset create only accepts png/jpeg/mp4/webm/mp3/wav/pdf. Renaming .srt to .txt or .mp3 does not bypass it (server sniffs content, not extension). The asset_id route is in the request schema for forward compatibility but cannot currently be exercised through the standard upload path. Use the URL route. The reference now documents practical hosts that work (gist raw URLs, GitHub raw URLs, S3 public-read, presigned URLs >=2h, Vercel/static). Two new failure_message strings added to troubleshooting.md from real API responses: - 'Failed to download video from url, please check the url is valid or the video is public' (instant-fail on bad/auth-walled source URL) - 'Your video's audio is missing or corrupted, please try with another video' (~30s fail when source has no speech) Other documented quirks: - proofreads create returns proofread_ids (plural, one per language) plus a session-level status — per-id status comes from proofreads get. - After generate, polling shifts from proofreads get to video-translate get because the resource graduates from proofread to translation. - Captions on generate are independent of the proofread session's SRT — --captions controls whether the FINAL video burns captions in. - Proofread session TTL ~24h. Out of scope for this commit (still in followup queue): - NanoClaw platform variant - Eval scenarios for heygen-translate - File issue/PR upstream re: SRT asset upload (worth surfacing to HeyGen CLI team — the asset_id route in the schema can't be reached today) * fix(heygen-translate): auth gate, duration question, open-ended language input Three improvements from dogfooding: - Add auth verification step before Phase 1: runs `heygen auth status` in CLI mode, asks for API key and persists via `heygen auth login` if missing. One-time setup that survives across sessions. - Add duration flexibility question to Phase 1 discovery: asks whether output must match source length, explains quality tradeoff, controls `enable_dynamic_duration` flag instead of hardcoding true. - Make target language question explicitly open-ended: no picker, no pre-assigned choices. User types freely, validation in Phase 2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(heygen-translate): align dynamic_duration references with Phase 1 question SKILL.md:335 and references/language-locale-guide.md:49 both said "Always enable_dynamic_duration: true", contradicting the new Phase 1 duration flexibility question. Updated both to reference the user's choice and warn about quality degradation on high-compression pairs when fixed-length is chosen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: David Chou <david.chou@heygen.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Closing as superseded by #82 (merged at |
Supersedes #43 (with @kenchung's blessing).
Contains @kenchung's full v3 rewrite (3 commits) plus one follow-up commit from end-to-end testing. Bundling them so the entire video-translate v3 ships as one reviewable change against master.
What @kenchung's commits ship (382b03c, f14e0a0, 3199516)
video-translateskill built on the v3 API (POST /v3/video-translations)videofield for source routing (URL / asset upload / existing asset ID)What the hardening commit adds (badc217)
Eight fixes found by running the skill end-to-end against real translations:
Install path
skills/video-translate/→video-translate/to matchheygen-avatar/andheygen-video/layoutvideo-translateinsetup'sSKILLSarraySkill content
~/.heygen/config(mirrors parentSKILL.md's 3-source order)[Download .mp4]+[SRT] · [VTT]\$statuszsh read-only collision in poll script template{ languages: string[] }, not{ data: [{ code, name }] }/videos/{id}not/video-translate/{id}Test plan
./setupfrom a fresh checkout — verifyvideo-translatesymlinks into~/.claude/skillsView onlinelink opens the correct dashboard pageAfter merge