kino
If a browser can render it, kino can record it.
Programmatic video generation + AI-powered video editing. Framework-agnostic, agent-native.
kino is two things in one:
1. Programmatic video generation β render any animated HTML page (GSAP, Three.js, CSS, p5.js, Svelte, React β anything) into a deterministic MP4 via headless Chrome + FFmpeg. No framework lock-in, no DSL.
2. AI-powered video editing β take source footage, extract word-level captions via WhisperX, then have an AI agent write overlay decisions in plain JSON. kino renders the overlays as transparent frames and composites them natively via FFmpeg. Any model β Claude, GPT, Gemini, local β can direct the edit.
npm install @kinohq/core# Step 1 β Extract transcript with word-level timing
kino extract-transcript clip.mp4 -o transcript.json
# Step 2 β Give transcript.json to any AI agent with EDIT-AGENT-CONTRACT.md
# The agent writes overlay-decisions.json
# Step 3 β Preview overlays (one frame per overlay, ~30s)
kino preview-overlays clip.mp4 --overlays overlay-decisions.json -o preview.html
# Step 4 β Full render
kino render-overlays clip.mp4 \
--overlays overlay-decisions.json \
--srt captions.srt \
--quality balanced \
-o output.mp4npx kino render animation.html --duration 10 --fps 30 -o video.mp4import { Scene, Text, Shape, fadeIn, stagger } from "@kinohq/sdk";
const scene = new Scene({ duration: 5, background: "#0a0a0a" });
const title = new Text("Hello World", { fontSize: 72, fontWeight: "bold", color: "#fff" });
title.animate("opacity", { 0: 0, 0.5: 0, 1.5: 1 });
scene.add(title);
await scene.render("output.mp4");from kino import Scene, Text
scene = Scene(duration=5, background="#0a0a0a")
scene.add(Text("Hello World", font_size=72, font_weight="bold", color="#ffffff"))
scene.render("output.mp4")| kino | Remotion | Motion Canvas | |
|---|---|---|---|
| Any HTML/CSS/JS | β | β React only | β Custom DSL |
| GSAP, Three.js, p5.js | β Native | β | |
| CSS Animations | β Automatic | β | |
| Python SDK | β | β | β |
| Video editing from footage | β | β | β |
| Word-level captions (WhisperX) | β | β | β |
| AI Agent Integration | β skill.md + CLI Contract | β | β |
| Multi-scene + Transitions | β 23 types | β | β |
| Subtitles (SRT/VTT) | β | β | β |
| Zero config | β | β React setup | β |
- Time Virtualization β Patches
Date.now,performance.now,requestAnimationFrame,setTimeout,setInterval, CSS Animations,<video>/<audio>β every time API, every animation library works - Frame-perfect capture β Deterministic via headless Chrome CDP screenshots; rendering is independent of system speed
- FFmpeg pipeline β Frames piped directly to FFmpeg stdin β no temp files, any codec, quality-matched encoding
kino editβ Single command: source footage β edited video with word-level captions + auto-generated motion graphics overlays- Word-level captions β WhisperX word timing β 2-4 word groups, 5 animation presets (pop-in, karaoke, highlight, minimal, bold-center)
- Quality-matched encoding β ffprobe extracts source bitrate/codec β output is visually identical to input
- Transparent overlay compositing β Overlay frames rendered in browser as transparent PNGs, composited by FFmpeg natively β source video never touches the browser
- Multi-format output β
--format landscape|vertical|square|sourcewith automatic FFmpeg scale+pad - Style presets β neo-brutalist, clean-minimal, corporate, bold-dark, poster-modernist
- Model-agnostic β Any AI (Claude, GPT, Gemini, local) reads
EDIT-AGENT-CONTRACT.mdand writes overlay decisions in plain JSON kino extract-transcriptβ Extracts enriched transcript with pause detection, energy curve, stat detection, narrative segmentation, and suggested entry pointskino render-overlaysβ Takes agent-written JSON + source video β professionally edited outputkino preview-overlaysβ One frame per overlay, ~30s, catches 90% of issues before committing to 8-minute full render- 5 Style Kits β kinetic-orange, minimal-authority, bold-dark-social, clean-professional, cosmic-particles
- 4 Editorial Templates β authority-builder, viral-hook, educational-breakdown, social-proof-stack
- Multi-scene β Stitch multiple HTML pages as scenes via
kino compose - 23 Transitions β fade, dissolve, wipe, slide, circle, smooth, pixelize, zoom, and more
- Subtitles β SRT/VTT parser with customizable HTML overlay synced to virtual time
- TypeScript SDK β
Scene,Text,Shape,Imageelements with chainable animations - Python SDK β Same API surface, generates HTML + calls Node renderer
- Animation Primitives β
fadeIn,fadeOut,slideIn,slideOut,scaleIn,rotateIn,stagger
- Visual Timeline β Scrub to any frame, see it instantly
- Hot-reload β Edit your HTML, see changes immediately
- Keyboard-first β Space, arrows, Home/End, number keys
- One-click render β Preview β Render MP4
- Agent Skill β Comprehensive
skill.md+EDIT-AGENT-CONTRACT.mdfor Claude Code and any coding agent - Agent-friendly errors β Parseable error messages with actionable hints
- Model-agnostic edit pipeline β Any AI writes overlay decisions; kino renders them
# ββ Video Editing ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Full auto-edit pipeline (probe + transcribe + overlay generation + render)
kino edit <video> [options]
--word-timings <path> WhisperX JSON with word-level timing
--style <preset> neo-brutalist | clean-minimal | corporate | bold-dark
--caption-style <preset> pop-in | karaoke | highlight | minimal | bold-center
--format <format> landscape | vertical | square | source
--quality <level> fast | balanced | slow | lossless
--speaker-name <name> For lower-third identity card
--output <path>
# Extract enriched transcript (pause detection, stats, energy curve)
kino extract-transcript <video> -o transcript.json
# Render agent overlay decisions onto source video
kino render-overlays <video> --overlays decisions.json [--srt captions.srt] -o output.mp4
# Preview overlays before full render (~30s vs 8-12min)
kino preview-overlays <video> --overlays decisions.json -o preview.html
# ββ Programmatic Video βββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Render any HTML page or scene manifest
kino render <input> -o video.mp4
-d, --duration <seconds> Duration (required for HTML files)
--fps <number> Frames per second (default: 30)
--width <pixels> Width (default: 1920)
--height <pixels> Height (default: 1080)
# Preview a single frame as PNG
kino preview <input> --frame 75 -o preview.png
# Compose multiple scenes with transitions
kino compose composition.json -o output.mp4
# Launch visual studio
kino-studio scene.jsonkino is a rendering engine, not an AI. Any AI agent reads EDIT-AGENT-CONTRACT.md and produces overlay decisions as JSON. kino renders them.
[
{
"startMs": 4200,
"durationMs": 5000,
"type": "hook-phrase",
"rationale": "Post-pause entry after hook setup; kinetic phrase amplifies the thesis",
"css": ".cls-__ID__-wrap { position: absolute; top: 15%; width: 100%; text-align: center; }",
"html": "<div class=\"cls-__ID__-wrap\"><span class=\"cls-__ID__-word\">AUTOMATE</span></div>",
"initJs": "el._tl = gsap.timeline({paused:true}); el._tl.fromTo('.cls-__ID__-word', {scale:0, opacity:0}, {scale:1, opacity:1, duration:0.4, ease:'back.out(1.7)'});"
}
]The agent writes CSS, HTML, and GSAP timelines with __ID__ as a namespace. kino replaces __ID__ with a unique instance ID, renders each overlay as a transparent PNG frame via headless Chrome, and FFmpeg composites them onto the source video.
Any model, any overlay. The contract is the only coupling.
Read .agents/EDIT-AGENT-CONTRACT.md for the full 5-phase editing loop: PERCEIVE β ORIENT β COMPOSE β PREVIEW β RENDER.
kino injects a script that patches all browser time APIs before your page loads:
| API | What Happens |
|---|---|
Date.now() |
Returns virtual time |
performance.now() |
Returns virtual time (ms) |
requestAnimationFrame() |
Queued, flushed per virtual frame |
setTimeout() / setInterval() |
Fire at virtual time thresholds |
| CSS Animations | document.getAnimations() synced to virtual time |
<video> / <audio> |
Seeked to virtual time, play() is no-op |
This means any animation library that uses standard browser APIs works automatically β GSAP, anime.js, Three.js, Lottie, D3, p5.js, Framer Motion, you name it. Rendering is deterministic regardless of system speed.
| Package | Description |
|---|---|
@kinohq/core |
Render engine, CLI, time virtualization, FFmpeg pipeline, video editing |
@kinohq/sdk |
TypeScript SDK β Scene, elements, animations, codegen |
@kinohq/studio |
Visual preview UI with timeline and hot-reload |
kino (PyPI) |
Python SDK β same API, generates HTML, calls Node renderer |
- Node.js 20+
- FFmpeg in PATH (install)
- Chrome/Chromium (auto-downloaded by Puppeteer)
- WhisperX (optional β for word-level caption extraction)
| Example | Use Case | Tech |
|---|---|---|
| hello-world | Basic render | Vanilla JS |
| css-animations | CSS @keyframes | CSS |
| gsap | GSAP timeline | GSAP 3.12 CDN |
| sdk-ts | TypeScript SDK | SDK |
| animation-primitives | SDK presets + stagger | SDK |
| multi-scene | Composition + transitions | 3 scenes |
| brand-video | Full composition | 13.4s |
| video-edit/overlay-kinetic-white-v4-clip01.json | 24-overlay AI edit | Edit Agent |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUTHORING LAYER β
β HTML β TS SDK β Python SDK β AI Agent JSON β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β Scene Manifest / Overlay Decisions
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β CORE RENDERER β
β Time Virtualization β Puppeteer β FFmpeg Pipeline β
β (All time APIs patched β deterministic per frame) β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β MP4
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β OUTPUT VIDEO β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
git clone https://github.com/Trejon-888/frameforge.git
cd frameforge
pnpm install
pnpm build
pnpm test # 238 testsMIT β Free and open source.
FF
Code your video. Frame by frame.