diff --git a/skills/youtube-kit/SKILL.md b/skills/youtube-kit/SKILL.md new file mode 100644 index 0000000..f013c90 --- /dev/null +++ b/skills/youtube-kit/SKILL.md @@ -0,0 +1,224 @@ +--- +name: youtube-kit +description: Full YouTube launch package — thumbnails, titles, descriptions — for Every's channel. Extracts video frames, composites text via PIL (never AI-regenerated faces), pulls recent channel descriptions via YouTube API to match Every's voice. Use when publishing a YouTube video, creating thumbnails, or writing video copy. +--- + +# YouTube Kit + +Full launch package for Every's YouTube channel: thumbnail + title + description grounded in real channel data. + +## When to Use + +- "I need a thumbnail for this video" +- "Help me with YouTube copy for this video" +- "We're publishing a video, need the full package" +- Any video going on the Every YouTube channel + +## Quick Start + +User provides: a video file (or transcript), optional reference screenshots, and direction on the hook/angle. + +``` +1. Pull channel style → YouTube API recent descriptions + titles +2. Read transcript → extract the core hook and structure +3. Extract candidate frames → ffmpeg, show user 6-8 options +4. User picks a frame → composite text with PIL +5. Generate title + description → matched to channel voice +6. Deliver → files to ~/Downloads, description to clipboard +``` + +## Setup + +Requires these env vars in `.env.local`: +- `YOUTUBE_API_KEY` — YouTube Data API v3 +- `GEMINI_API_KEY` — Only for background extension (never faces) + +Every's channel ID: `UCjIMtrzxYc0lblGhmOgC_CA` + +Fonts (must be installed locally): +- **Headline:** `TT Interphases Pro Trial Black.ttf` (or Condensed variant) +- **Serif subtext:** `Lora-VariableFont_wght.ttf` +- **Eyebrow/label:** `Switzer Variable.ttf` +- **Fallbacks:** Arial Black, Impact, Futura (system fonts) + +Font directory: `~/Library/Fonts/` + +## Step 1: Pull Channel Style + +Fetch the 5 most recent long-form videos (> 3 min) to study description patterns: + +```bash +source .env.local +# Recent videos +curl -s "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCjIMtrzxYc0lblGhmOgC_CA&order=date&maxResults=20&type=video&key=${YOUTUBE_API_KEY}" -o /tmp/yt_recent.json + +# Get video IDs, then fetch full details (snippet + contentDetails for duration filtering) +# Filter to videos > 3 min (not Shorts) +curl -s "https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id=ID1,ID2,...&key=${YOUTUBE_API_KEY}" +``` + +**What to extract:** +- Description structure (opening hook → argument → CTA → timestamps) +- Title patterns (length, keywords, voice) +- View counts (what's working recently) + +**Every's description templates (from channel analysis):** + +**Dan solo monologue (< 10 min):** +- Declarative opening echoing the video's first line +- Third-person Dan framing ("Every CEO Dan Shipper...") +- Core argument as a numbered list +- "Plus:" teaser for additional value +- Single Subscribe CTA: "Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe" +- Product mention if relevant (Proof, Spiral, etc.) +- Narrative chapter titles (not topic labels) + +**Long interview / panel (> 20 min):** +- Multi-paragraph summary with all speakers named +- "If you found this episode interesting, please like, subscribe, comment, and share!" +- "To hear more from Dan Shipper:" block with Subscribe + X links +- Detailed timestamps with full sentences + +## Step 2: Read Transcript + +If the user provides a transcript file, read it. If they provide a video file, extract it: + +```bash +# If youtube-transcript-api is available +python montaigne/fetch_transcript.py VIDEO_ID_OR_URL + +# Otherwise ask user to provide transcript or paste key points +``` + +From the transcript, identify: +- The core hook (first 30 seconds) +- The main framework/argument +- Key quotable lines (for thumbnail text) +- Chapter markers (timestamp + topic shifts) + +## Step 3: Extract Candidate Frames + +```bash +VIDEO="/path/to/video.mp4" +mkdir -p /tmp/yt-thumb-frames + +# Extract ~20 frames spread across the video (skip first 5s and last 30s) +for t in 8 12 17 25 40 60 90 120 150 180 210 240 270 300 330 360; do + ffmpeg -y -ss $t -i "$VIDEO" -frames:v 1 -update 1 -q:v 1 \ + "/tmp/yt-thumb-frames/frame_${t}s.jpg" 2>/dev/null +done +``` + +Show the user 6-8 of the best candidates. **Pick frames where:** +- Eyes are on camera (direct eye contact > looking away) +- Mouth is natural (not mid-word with weird shapes) +- Expression matches the hook (confident, surprised, engaged) +- Hands are visible if gesturing (adds energy) +- Lighting is flattering + +**CRITICAL: Skip frames that are screen recordings, slides, or b-roll.** Only show frames with the speaker's face. + +## Step 4: Composite Thumbnail + +### The Golden Rule + +**NEVER let Gemini (or any AI image model) regenerate a person's face.** AI models subtly distort facial features — warped smiles, uncanny eyes, wrong proportions. The result always looks "off" even if you can't pinpoint why. + +**Use PIL/Pillow for ALL text compositing.** The video frame stays pixel-identical to the source. + +### When to Use Gemini + +ONLY for extending/outpainting backgrounds where no face is present: +- Extending a wall or backdrop to create more text space +- Filling in a blank region with matched background texture + +Never pass a region containing a face to Gemini for editing. + +### Thumbnail Composition + +```python +from PIL import Image, ImageDraw, ImageFont +import os + +FONTS = os.path.expanduser("~/Library/Fonts") + +# Load and upscale 2x for crisp 1920x1080 output +hero = Image.open("/tmp/yt-thumb-frames/frame_17s.jpg").convert("RGB") +hero = hero.resize((hero.width * 2, hero.height * 2), Image.LANCZOS) +W, H = hero.size + +# Load fonts +f_headline = ImageFont.truetype(os.path.join(FONTS, "TT Interphases Pro Trial Black.ttf"), int(H * 0.18)) +f_serif = ImageFont.truetype(os.path.join(FONTS, "Lora-VariableFont_wght.ttf"), int(H * 0.04)) + +# Draw text +draw = ImageDraw.Draw(hero) +draw.text((int(W * 0.03), int(H * 0.15)), "HEADLINE", font=f_headline, fill=(17, 17, 17)) + +# Save at 1920x1080 +hero.resize((1920, 1080), Image.LANCZOS).save("thumb.jpg", quality=94) +``` + +### Layout Options + +**Text-over-wall:** If the speaker is center-right and there's a light wall on the left, place text directly over the wall. Works when background is light and uniform. + +**Parchment/sticker label:** Render text on a cream rounded rectangle, rotate slightly (-3° to -5°), composite over the frame. Good for busy backgrounds (B-roll, themed graphics). + +**Bottom band:** Solid cream band across the bottom 20% of the frame. Speaker stays fully visible above. Safe fallback. + +**Background extension:** If the speaker is too centered, use Gemini to outpaint more background to the LEFT only. Then composite text on the extended area with PIL. + +### Text Styling + +- **Headlines:** ALL CAPS, heavy black sans-serif, near-black fill (#111) +- **Subtext:** Sentence case, thin serif (Lora), dark charcoal (#333) +- **Eyebrow:** ALL CAPS, letter-spaced, muted red (#B9361F) or charcoal +- **NO drop shadows, NO outlines, NO gradients on text** (editorial, not MrBeast) +- Exception: white text with heavy black stroke works on very busy/dark backgrounds + +### Iteration + +Generate 2-3 layout variants per round. Show the user. Expect 1-3 rounds to nail it. Common feedback: +- "Text too small" → bump font size 20-30% +- "Covers his face" → shift text or use background extension +- "Looks generic" → add tilt, eyebrow label, or themed element +- "Expression is weird" → swap to a different frame, don't try to fix with AI + +## Step 5: Title + Description + +### Titles + +Generate 5 options. Principles: +- Under 60 characters (mobile display) +- Lead with the outcome or transformation, not the topic name +- Create a curiosity gap — promise something the viewer can't guess +- **Thumbnail = hook, Title = payoff** (don't repeat the thumbnail text) +- Avoid jargon only insiders know +- Every's audience: AI-curious professionals, not pure developers + +### Description + +Match the template from Step 1 based on video format. Always include: +- Opening that echoes the video's first spoken line +- Third-person framing for Dan (or speaker) +- Subscribe CTA: `Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe` +- Narrative chapter timestamps (derived from transcript) +- Product links if mentioned in video + +Copy the final description to clipboard with `pbcopy`. + +## Step 6: Deliver + +Save all thumbnail options to `~/Downloads/{video-slug}-thumb-options/`. +Copy description to clipboard. +Present title recommendations with rationale. + +## Guidelines + +- Always read the transcript before generating anything +- Always pull recent channel data — styles evolve +- Never fabricate quotes or stats not in the transcript +- Default to "I'M A/AN [CONCEPT]" for Dan confession-style videos +- Expect 2-4 thumbnail iterations — that's normal, not failure +- For themed backgrounds (pirate flag, etc.), use the parchment-label approach — it reads cleanly over busy visuals