Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 224 additions & 0 deletions skills/youtube-kit/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
---
name: youtube-kit
description: Full YouTube launch package — thumbnails, titles, descriptions — for Every's channel. Extracts video frames, composites text via PIL (never AI-regenerated faces), pulls recent channel descriptions via YouTube API to match Every's voice. Use when publishing a YouTube video, creating thumbnails, or writing video copy.
---

# YouTube Kit

Full launch package for Every's YouTube channel: thumbnail + title + description grounded in real channel data.

## When to Use

- "I need a thumbnail for this video"
- "Help me with YouTube copy for this video"
- "We're publishing a video, need the full package"
- Any video going on the Every YouTube channel

## Quick Start

User provides: a video file (or transcript), optional reference screenshots, and direction on the hook/angle.

```
1. Pull channel style → YouTube API recent descriptions + titles
2. Read transcript → extract the core hook and structure
3. Extract candidate frames → ffmpeg, show user 6-8 options
4. User picks a frame → composite text with PIL
5. Generate title + description → matched to channel voice
6. Deliver → files to ~/Downloads, description to clipboard
```

## Setup

Requires these env vars in `.env.local`:
- `YOUTUBE_API_KEY` — YouTube Data API v3
- `GEMINI_API_KEY` — Only for background extension (never faces)

Every's channel ID: `UCjIMtrzxYc0lblGhmOgC_CA`

Fonts (must be installed locally):
- **Headline:** `TT Interphases Pro Trial Black.ttf` (or Condensed variant)
- **Serif subtext:** `Lora-VariableFont_wght.ttf`
- **Eyebrow/label:** `Switzer Variable.ttf`
- **Fallbacks:** Arial Black, Impact, Futura (system fonts)

Font directory: `~/Library/Fonts/`

## Step 1: Pull Channel Style

Fetch the 5 most recent long-form videos (> 3 min) to study description patterns:

```bash
source .env.local
# Recent videos
curl -s "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCjIMtrzxYc0lblGhmOgC_CA&order=date&maxResults=20&type=video&key=${YOUTUBE_API_KEY}" -o /tmp/yt_recent.json

# Get video IDs, then fetch full details (snippet + contentDetails for duration filtering)
# Filter to videos > 3 min (not Shorts)
curl -s "https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id=ID1,ID2,...&key=${YOUTUBE_API_KEY}"
```

**What to extract:**
- Description structure (opening hook → argument → CTA → timestamps)
- Title patterns (length, keywords, voice)
- View counts (what's working recently)

**Every's description templates (from channel analysis):**

**Dan solo monologue (< 10 min):**
- Declarative opening echoing the video's first line
- Third-person Dan framing ("Every CEO Dan Shipper...")
- Core argument as a numbered list
- "Plus:" teaser for additional value
- Single Subscribe CTA: "Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe"
- Product mention if relevant (Proof, Spiral, etc.)
- Narrative chapter titles (not topic labels)

**Long interview / panel (> 20 min):**
- Multi-paragraph summary with all speakers named
- "If you found this episode interesting, please like, subscribe, comment, and share!"
- "To hear more from Dan Shipper:" block with Subscribe + X links
- Detailed timestamps with full sentences

## Step 2: Read Transcript

If the user provides a transcript file, read it. If they provide a video file, extract it:

```bash
# If youtube-transcript-api is available
python montaigne/fetch_transcript.py VIDEO_ID_OR_URL

# Otherwise ask user to provide transcript or paste key points
```

From the transcript, identify:
- The core hook (first 30 seconds)
- The main framework/argument
- Key quotable lines (for thumbnail text)
- Chapter markers (timestamp + topic shifts)

## Step 3: Extract Candidate Frames

```bash
VIDEO="/path/to/video.mp4"
mkdir -p /tmp/yt-thumb-frames

# Extract ~20 frames spread across the video (skip first 5s and last 30s)
for t in 8 12 17 25 40 60 90 120 150 180 210 240 270 300 330 360; do
ffmpeg -y -ss $t -i "$VIDEO" -frames:v 1 -update 1 -q:v 1 \
"/tmp/yt-thumb-frames/frame_${t}s.jpg" 2>/dev/null
done
```

Show the user 6-8 of the best candidates. **Pick frames where:**
- Eyes are on camera (direct eye contact > looking away)
- Mouth is natural (not mid-word with weird shapes)
- Expression matches the hook (confident, surprised, engaged)
- Hands are visible if gesturing (adds energy)
- Lighting is flattering

**CRITICAL: Skip frames that are screen recordings, slides, or b-roll.** Only show frames with the speaker's face.

## Step 4: Composite Thumbnail

### The Golden Rule

**NEVER let Gemini (or any AI image model) regenerate a person's face.** AI models subtly distort facial features — warped smiles, uncanny eyes, wrong proportions. The result always looks "off" even if you can't pinpoint why.

**Use PIL/Pillow for ALL text compositing.** The video frame stays pixel-identical to the source.

### When to Use Gemini

ONLY for extending/outpainting backgrounds where no face is present:
- Extending a wall or backdrop to create more text space
- Filling in a blank region with matched background texture

Never pass a region containing a face to Gemini for editing.

### Thumbnail Composition

```python
from PIL import Image, ImageDraw, ImageFont
import os

FONTS = os.path.expanduser("~/Library/Fonts")

# Load and upscale 2x for crisp 1920x1080 output
hero = Image.open("/tmp/yt-thumb-frames/frame_17s.jpg").convert("RGB")
hero = hero.resize((hero.width * 2, hero.height * 2), Image.LANCZOS)
W, H = hero.size

# Load fonts
f_headline = ImageFont.truetype(os.path.join(FONTS, "TT Interphases Pro Trial Black.ttf"), int(H * 0.18))
f_serif = ImageFont.truetype(os.path.join(FONTS, "Lora-VariableFont_wght.ttf"), int(H * 0.04))

# Draw text
draw = ImageDraw.Draw(hero)
draw.text((int(W * 0.03), int(H * 0.15)), "HEADLINE", font=f_headline, fill=(17, 17, 17))

# Save at 1920x1080
hero.resize((1920, 1080), Image.LANCZOS).save("thumb.jpg", quality=94)
```

### Layout Options

**Text-over-wall:** If the speaker is center-right and there's a light wall on the left, place text directly over the wall. Works when background is light and uniform.

**Parchment/sticker label:** Render text on a cream rounded rectangle, rotate slightly (-3° to -5°), composite over the frame. Good for busy backgrounds (B-roll, themed graphics).

**Bottom band:** Solid cream band across the bottom 20% of the frame. Speaker stays fully visible above. Safe fallback.

**Background extension:** If the speaker is too centered, use Gemini to outpaint more background to the LEFT only. Then composite text on the extended area with PIL.

### Text Styling

- **Headlines:** ALL CAPS, heavy black sans-serif, near-black fill (#111)
- **Subtext:** Sentence case, thin serif (Lora), dark charcoal (#333)
- **Eyebrow:** ALL CAPS, letter-spaced, muted red (#B9361F) or charcoal
- **NO drop shadows, NO outlines, NO gradients on text** (editorial, not MrBeast)
- Exception: white text with heavy black stroke works on very busy/dark backgrounds

### Iteration

Generate 2-3 layout variants per round. Show the user. Expect 1-3 rounds to nail it. Common feedback:
- "Text too small" → bump font size 20-30%
- "Covers his face" → shift text or use background extension
- "Looks generic" → add tilt, eyebrow label, or themed element
- "Expression is weird" → swap to a different frame, don't try to fix with AI

## Step 5: Title + Description

### Titles

Generate 5 options. Principles:
- Under 60 characters (mobile display)
- Lead with the outcome or transformation, not the topic name
- Create a curiosity gap — promise something the viewer can't guess
- **Thumbnail = hook, Title = payoff** (don't repeat the thumbnail text)
- Avoid jargon only insiders know
- Every's audience: AI-curious professionals, not pure developers

### Description

Match the template from Step 1 based on video format. Always include:
- Opening that echoes the video's first spoken line
- Third-person framing for Dan (or speaker)
- Subscribe CTA: `Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe`
- Narrative chapter timestamps (derived from transcript)
- Product links if mentioned in video

Copy the final description to clipboard with `pbcopy`.

## Step 6: Deliver

Save all thumbnail options to `~/Downloads/{video-slug}-thumb-options/`.
Copy description to clipboard.
Present title recommendations with rationale.

## Guidelines

- Always read the transcript before generating anything
- Always pull recent channel data — styles evolve
- Never fabricate quotes or stats not in the transcript
- Default to "I'M A/AN [CONCEPT]" for Dan confession-style videos
- Expect 2-4 thumbnail iterations — that's normal, not failure
- For themed backgrounds (pirate flag, etc.), use the parchment-label approach — it reads cleanly over busy visuals
Loading