Skip to content

feat: rewrite video-translate skill with v3 API (hardened)#46

Closed
davidchou-heygen wants to merge 4 commits into
heygen-com:masterfrom
davidchou-heygen:rewrite/video-translate-v3
Closed

feat: rewrite video-translate skill with v3 API (hardened)#46
davidchou-heygen wants to merge 4 commits into
heygen-com:masterfrom
davidchou-heygen:rewrite/video-translate-v3

Conversation

@davidchou-heygen
Copy link
Copy Markdown
Collaborator

Supersedes #43 (with @kenchung's blessing).

Contains @kenchung's full v3 rewrite (3 commits) plus one follow-up commit from end-to-end testing. Bundling them so the entire video-translate v3 ships as one reviewable change against master.

What @kenchung's commits ship (382b03c, f14e0a0, 3199516)

  • New video-translate skill built on the v3 API (POST /v3/video-translations)
  • Content-type routing: maps talking-head, podcast, music-video, and corporate content to optimal flag combinations instead of exposing raw boolean toggles
  • Precision mode default-on, first-time user detection, source quality guardrails, speaker count validation
  • Languages fetched from API (no hardcoded table)
  • Discriminated-union video field for source routing (URL / asset upload / existing asset ID)
  • URL sanity check (HEAD request) before submission, with local-upload fallback

What the hardening commit adds (badc217)

Eight fixes found by running the skill end-to-end against real translations:

Install path

  • Move skills/video-translate/video-translate/ to match heygen-avatar/ and heygen-video/ layout
  • Register video-translate in setup's SKILLS array

Skill content

  • Interactive intake — ask one question at a time, extract from first message, propose sensible defaults
  • API key resolution from ~/.heygen/config (mirrors parent SKILL.md's 3-source order)
  • Soak-then-tight polling — 5-min silent soak, then 30s polls until terminal
  • Backgrounded bash polling — replaces broken Agent subagent path (subagents can't inherit Bash approvals in Claude Code)
  • Upfront approval-count warning before launching N background polls
  • Narrate-before-each-Bash rule — one sentence about why before every shell call
  • Markdown-link-wrapped delivery — flag emoji header + view-online link + [Download .mp4] + [SRT] · [VTT]
  • Shell hygiene section — env vars don't persist across Bash calls (inline auth load every time), zsh "no matches found" = unquoted glob, don't burn approvals on diagnostics
  • Fix \$status zsh read-only collision in poll script template
  • Correct languages endpoint response shape: real API returns { languages: string[] }, not { data: [{ code, name }] }
  • Correct dashboard URL pattern: /videos/{id} not /video-translate/{id}

Test plan

  • Run ./setup from a fresh checkout — verify video-translate symlinks into ~/.claude/skills
  • Submit a translation end-to-end — verify one approval per language, no foreground polling, deliverable renders with clickable links
  • Verify the language list query returns successfully on first call (no jq path error)
  • Verify View online link opens the correct dashboard page

After merge

Ken and others added 4 commits April 14, 2026 20:26
Replace the thin v2 API wrapper with an expert v3 skill that encodes
translation best practices. Key changes:

- Upgrade from v2 to v3 API (POST /v3/video-translations)
- Content-type routing: map talking head, podcast, music video, and
  corporate content to optimal flag combinations instead of exposing
  raw boolean toggles
- Precision mode always on (substantial quality gain, negligible cost)
- First-time user detection and onboarding flow
- Source quality guardrails and speaker count validation
- Languages fetched from API instead of hardcoded table
- Phase 2 features documented (brand voice, custom SRT, partial
  translation, multi-language batch)

Based on planning session with David Chou (VT team lead).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- video field is a discriminated union object, not flat video_url/video_id
- output_languages is an array, not singular output_language
- Response returns video_translation_ids[] array
- Add Step 0: input source routing (URL, local file upload via POST /v3/assets, existing asset)
- Fix status response to include all fields (audio_url, caption URLs, etc.)
- Fix failure response field name (failure_message, not message)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HEAD request to verify URL is publicly accessible: check status code,
content type, and file size. If inaccessible, offer the local file
upload path as fallback instead of just failing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Found and fixed eight issues by running the skill end-to-end against
real translations. All changes are tightly coupled — each fix came
from a real failure or friction point in the same debugging session.

Install path:
- Move skills/video-translate/ to repo root to match heygen-avatar/
  and heygen-video/ layout (setup looks for skills at root)
- Register video-translate in setup's SKILLS array so it gets
  symlinked alongside the other two skills

Skill content:
- Interactive intake: ask one question at a time, extract from first
  message before asking, propose sensible defaults instead of a
  4-question form
- API key resolution from ~/.heygen/config (mirrors parent SKILL.md's
  3-source order — env > config file > prompt user)
- Soak-then-tight polling cadence: 5-min silent soak, then 30s polls
  until terminal; never "check back in 20 min" (optimizes for cache,
  not user wait)
- Backgrounded bash polling (replaces broken Agent subagent path —
  subagents can't inherit Bash approvals in Claude Code so every
  curl in them dies on permission denial)
- Upfront approval-count warning before launching N background polls
- Narrate-before-each-Bash rule: one sentence about why before every
  shell call, especially for backgrounded scripts the user can't see
- Markdown-link-wrapped delivery: flag emoji header + view-online
  link + [Download .mp4] + [SRT] · [VTT]; stops dumping 200-char
  signed URLs into the terminal
- Shell hygiene section: env vars don't persist across Bash calls
  (inline auth load every time), zsh "no matches found" = unquoted
  glob, don't burn approvals on diagnostics
- Fix \$status zsh read-only collision in poll script template
  (rename to \$st with explanatory comment)
- Correct languages endpoint response shape: real API returns
  { languages: string[] }, not { data: [{ code, name }] }
- Correct dashboard URL pattern: /videos/{id} not /video-translate/{id}

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@kenchung kenchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Domain content is genuinely strong — content-type profiles, soak-then-tight polling, narrate-before-bash, ${status}-vs-st zsh fix, signed-URL markdown wrapping. Most operationally savvy of the three skills.

But this PR was authored against an older repo shape, and master changed materially in the last 24 hours (PR #77 self-containment + PR #79 feat!: eliminate root SKILL.md + PR #80 Codex .app.json). A handful of mechanical fixes are required before merge. ~30 min of work.

Must-fix

1. Skill name should be heygen-translate, not video-translate.
For consistency with siblings heygen-avatar and heygen-video. PR description even says David moved the dir to repo root "to match heygen-avatar/ and heygen-video/ layout" — same logic applies to the name itself. Rename: directory, frontmatter name:, setup SKILLS array entry, any manifest paths.

2. allowed-tools will break the skill at runtime. video-translate/SKILL.md:5 lists only mcp__heygen__*. The skill is heavily Bash-driven (curl, polling loops, jq, asset uploads). Claude Code will deny every shell call. Match siblings: Bash, WebFetch, Read, Write, mcp__heygen__*.

3. Missing version: frontmatter line with # x-release-please-version marker.
Both siblings have version: 3.1.0 # x-release-please-version at line 2. Without it, release-please will never bump this skill's version.

4. .cursor-plugin/plugin.json not updated. The skills array is still ["./heygen-avatar/", "./heygen-video/"]. Cursor users won't see the new skill until it's added: ["./heygen-avatar/", "./heygen-video/", "./heygen-translate/"].

5. release-please-config.json not updated. Add an extra-files entry for heygen-translate/SKILL.md so release-please bumps the version on conventional commits touching the skill.

Suggestions

6. Description missing "NOT for" clause + chain signals. Both siblings have a "NOT for" list (e.g., heygen-video says "NOT for: cinematic b-roll, video translation, TTS-only"). And add chain signals to heygen-avatar / heygen-video where overlap exists — translated talking-head output naturally chains to heygen-video for re-render, or heygen-avatar for voice-clone identity.

7. Missing canonical H2 sections. Both siblings have named ## Files & Paths, ## Language Awareness, ## UX Rules H2s (with the meta-content migrated in via PR #79). The substance is covered by the skill's intake/narrate/Bash-hygiene rules — just needs the canonical H2 labels + a Files & Paths table for ~/.heygen/config, signed-URL temp dirs, asset uploads.

8. Missing argument-hint: in frontmatter. Siblings have it (e.g., [topic_or_script] [--avatar avatar_id]). Add the equivalent for translate inputs.

What's good

  • Domain rules (soak-then-tight polling, content-type profiles, narrate-before-bash, upfront approval-count warning, backgrounded-bash for discovery) are real operational learnings.
  • No ../ references, no calls to deleted root files — already conforms to the post-#79 self-contained pattern in spirit.
  • Single 635-line SKILL.md is tight; no need to split into references/ yet.
  • Auth + shell hygiene sections are strong.

Verdict

Request changes. Items 1-5 are blocking (rename, runtime break, version anchor, plugin manifest, release-please config). 6-8 are nice-to-have for consistency with the post-#79 pattern. Once 1-5 land, this is a strong third skill in the stack.

eve-builds added a commit that referenced this pull request May 13, 2026
* feat: add heygen-translate skill (video translation / dubbing)

Adds a third skill, heygen-translate/, for translating and dubbing existing
videos into 175+ languages with voice cloning and lip-sync. Built on the
same independent-skill structure as heygen-avatar and heygen-video.

What:
- heygen-translate/SKILL.md (4-phase workflow: Discovery → Pre-flight →
  Submit+Poll → Deliver) with the same API Mode Detection ladder as
  heygen-video (OpenClaw plugin → CLI w/ HEYGEN_API_KEY → MCP → CLI
  fallback). All operations shown with MCP and CLI side-by-side, no raw
  curl.
- heygen-translate/references/troubleshooting.md (errors → action map,
  polling patterns, harness-specific notes for Claude Code / OpenClaw /
  Cursor)
- heygen-translate/references/language-locale-guide.md (regional variant
  defaults, formality registers, RTL caption collisions, tonal
  compression/expansion table, lip-sync ceiling per language)
- heygen-translate/references/proofreads-workflow.md (the high-stakes
  review-edit-render path: extract SRT → glossary discipline → register
  fixes → upload edited SRT → final render)
- heygen-translate/references/asset-routing.md (URL vs asset_id vs local
  upload routing, HEAD-check pattern, auth-walled URL fallbacks, 32 MB
  limit handling)

Replaces PR #46 with the new repo structure (independent skills, no root
SKILL.md, references inside the skill, validate-skills.yml self-contained
checks, MCP+CLI transport not raw API).

Why:
- PR #46's SKILL.md frontmatter declared 'allowed-tools: mcp__heygen__*'
  but every example used raw curl against api.heygen.com. Mismatch fixed
  here by using the heygen video-translate CLI (with MCP fallthrough)
  per the established pattern in heygen-avatar/heygen-video.
- PR #46 was authored against the pre-#79 structure (root SKILL.md +
  shared references/). Repo restructured 24h ago — each skill now owns
  its own SKILL.md and references/. This PR matches.
- PR #46 lacked embedded translation expertise. This SKILL.md adds:
  speaker-count discipline, source-quality triage, locale-pair gotchas
  (formality registers in ja/ko/de/th/hi, RTL caption collisions,
  tonal compression for en→zh/ja/ko, regional variants for es/pt/zh),
  lip-sync ceiling, captions burned-in vs sidecar, audio-only as a
  different deliverable not a workaround, cost/time math, and a
  failure-mode decoder.
- PR #46 used 'video-translate/' breaking the heygen-avatar/heygen-video
  prefix pattern. Renamed to 'heygen-translate/' for consistency in ls
  output and plugin manifest paths.
- Adds a true proofreads workflow (extract SRT → user/agent edits →
  upload corrected SRT → render) — this is the missing high-stakes path
  that distinguishes the skill from API docs.

Plumbing:
- .claude-plugin/marketplace.json registers heygen:translate
- .claude-plugin/plugin.json updates description + keywords
- .codex-plugin/plugin.json updates description, keywords, longDescription,
  defaultPrompt
- .cursor-plugin/plugin.json adds heygen-translate to skills array, plus
  keywords/tags
- .github/workflows/validate-skills.yml adds heygen-translate to path
  filter and runs the same self-contained-bundle checks as the other two
  skills
- release-please-config.json adds heygen-translate/SKILL.md as a
  release-please extra-files target so the version bumps in lockstep
- README.md, INSTALL.md, INSTALL_FOR_AGENTS.md, CLAUDE.md, CONTRIBUTING.md
  all updated to reference the third skill

Out of scope (followups):
- platforms/nanoclaw/heygen-translate/ NanoClaw container variant
- Eval scenarios for heygen-translate (mirror of R17-R23 pattern from
  heygen-video)
- gh skill / agentskills.io spec compliance check (handled by the
  spec-validate-soft job already in validate-skills.yml)
- Mark PR #46 as superseded once this lands

Refs: PR #46 (predecessor), #79 (independent-skills restructure), #77
(gh skill install path)

* docs(heygen-translate): document what the proofread CLI actually performs

Per Ken's ask in #tmp-vt-skill: rewrite proofreads-workflow.md (and the
Phase 3 proofread snippet in SKILL.md) against verified live behavior of
the heygen video-translate proofreads commands, not assumed/inferred
behavior.

Verified against the live API + CLI on Apr 27 with two real proofread
sessions (b84c8e8d... silent-source failure, 8ce0fba6c... Spanish
Sintel-trailer success).

Now documented:

- Five subcommands mapped to real REST endpoints:
    create     POST /v3/video-translations/proofreads
    get        GET  /v3/video-translations/proofreads/{id}
    srt get    GET  /v3/video-translations/proofreads/{id}/srt
    srt update PUT  /v3/video-translations/proofreads/{id}/srt
    generate   POST /v3/video-translations/proofreads/{id}/generate
- What the engine actually does between create and completed (downloads
  source, runs ASR for original_srt_url, translates to srt_url, no
  render yet).
- Real response shapes for create / get / srt get / srt update / generate
  with verified JSON examples and field-by-field meanings.
- Real status enum: processing | completed | failed (NOT pending|running
  — that's the translation-render endpoint, which is a different state
  machine the resource graduates into after generate).
- Polling cadence verified empirically: 3-5 min for SRT extraction on a
  50-second source. Hard timeout 30 min for stuck sessions.
- SRT format: standard SRT (UTF-8), well-formed timecodes, editable by
  hand or sed.
- File naming: <title>_proofread.srt and <title>_proofread_original.srt.
- original_srt_url is auto-populated source-language transcription, not
  a copy of any user-provided SRT. Useful as ground truth, never
  re-uploaded as target-language SRT.

Critical correction: heygen asset create does NOT accept SRT files.

The CLI exposes both URL and asset_id shapes for srt update, but the
asset_id upload path is currently BLOCKED:

    {"error":{"code":"invalid_parameter",
              "message":"Content type not supported application/x-subrip"}}

heygen asset create only accepts png/jpeg/mp4/webm/mp3/wav/pdf.
Renaming .srt to .txt or .mp3 does not bypass it (server sniffs content,
not extension). The asset_id route is in the request schema for forward
compatibility but cannot currently be exercised through the standard
upload path.

Use the URL route. The reference now documents practical hosts that
work (gist raw URLs, GitHub raw URLs, S3 public-read, presigned
URLs >=2h, Vercel/static).

Two new failure_message strings added to troubleshooting.md from real
API responses:
- 'Failed to download video from url, please check the url is valid or
   the video is public' (instant-fail on bad/auth-walled source URL)
- 'Your video's audio is missing or corrupted, please try with another
   video' (~30s fail when source has no speech)

Other documented quirks:
- proofreads create returns proofread_ids (plural, one per language)
  plus a session-level status — per-id status comes from proofreads get.
- After generate, polling shifts from proofreads get to
  video-translate get because the resource graduates from proofread
  to translation.
- Captions on generate are independent of the proofread session's SRT —
  --captions controls whether the FINAL video burns captions in.
- Proofread session TTL ~24h.

Out of scope for this commit (still in followup queue):
- NanoClaw platform variant
- Eval scenarios for heygen-translate
- File issue/PR upstream re: SRT asset upload (worth surfacing to HeyGen
  CLI team — the asset_id route in the schema can't be reached today)

* fix(heygen-translate): auth gate, duration question, open-ended language input

Three improvements from dogfooding:

- Add auth verification step before Phase 1: runs `heygen auth status` in CLI
  mode, asks for API key and persists via `heygen auth login` if missing.
  One-time setup that survives across sessions.
- Add duration flexibility question to Phase 1 discovery: asks whether output
  must match source length, explains quality tradeoff, controls
  `enable_dynamic_duration` flag instead of hardcoding true.
- Make target language question explicitly open-ended: no picker, no
  pre-assigned choices. User types freely, validation in Phase 2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(heygen-translate): align dynamic_duration references with Phase 1 question

SKILL.md:335 and references/language-locale-guide.md:49 both said "Always
enable_dynamic_duration: true", contradicting the new Phase 1 duration
flexibility question. Updated both to reference the user's choice and warn
about quality degradation on high-compression pairs when fixed-length is chosen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: David Chou <david.chou@heygen.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@eve-builds
Copy link
Copy Markdown
Collaborator

Closing as superseded by #82 (merged at 89d28b99 on 2026-05-13). The heygen-translate skill is now on master under the post-#79 independent-skill structure, with verified-against-live-API proofreads docs, full MCP+CLI side-by-side coverage, and David's auth-gate / duration-question / open-ended-language improvements from #86 incorporated. Thanks for the original draft — it kept the skill on the queue.

@eve-builds eve-builds closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants