EveryInc · everyskill-bot · Apr 14, 2026
@@ -0,0 +1,224 @@
+---
+name: youtube-kit
+description: Full YouTube launch package — thumbnails, titles, descriptions — for Every's channel. Extracts video frames, composites text via PIL (never AI-regenerated faces), pulls recent channel descriptions via YouTube API to match Every's voice. Use when publishing a YouTube video, creating thumbnails, or writing video copy.
+---
+
+# YouTube Kit
+
+Full launch package for Every's YouTube channel: thumbnail + title + description grounded in real channel data.
+
+## When to Use
+
+- "I need a thumbnail for this video"
+- "Help me with YouTube copy for this video"
+- "We're publishing a video, need the full package"
+- Any video going on the Every YouTube channel
+
+## Quick Start
+
+User provides: a video file (or transcript), optional reference screenshots, and direction on the hook/angle.
+
+```
+1. Pull channel style → YouTube API recent descriptions + titles
+2. Read transcript → extract the core hook and structure
+3. Extract candidate frames → ffmpeg, show user 6-8 options
+4. User picks a frame → composite text with PIL
+5. Generate title + description → matched to channel voice
+6. Deliver → files to ~/Downloads, description to clipboard
+```
+
+## Setup
+
+Requires these env vars in `.env.local`:
+- `YOUTUBE_API_KEY` — YouTube Data API v3
+- `GEMINI_API_KEY` — Only for background extension (never faces)
+
+Every's channel ID: `UCjIMtrzxYc0lblGhmOgC_CA`
+
+Fonts (must be installed locally):
+- **Headline:** `TT Interphases Pro Trial Black.ttf` (or Condensed variant)
+- **Serif subtext:** `Lora-VariableFont_wght.ttf`
+- **Eyebrow/label:** `Switzer Variable.ttf`
+- **Fallbacks:** Arial Black, Impact, Futura (system fonts)
+
+Font directory: `~/Library/Fonts/`
+
+## Step 1: Pull Channel Style
+
+Fetch the 5 most recent long-form videos (> 3 min) to study description patterns:
+
+```bash
+source .env.local
+# Recent videos
+curl -s "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=UCjIMtrzxYc0lblGhmOgC_CA&order=date&maxResults=20&type=video&key=${YOUTUBE_API_KEY}" -o /tmp/yt_recent.json
+
+# Get video IDs, then fetch full details (snippet + contentDetails for duration filtering)
+# Filter to videos > 3 min (not Shorts)
+curl -s "https://www.googleapis.com/youtube/v3/videos?part=snippet,contentDetails,statistics&id=ID1,ID2,...&key=${YOUTUBE_API_KEY}"
+```
+
+**What to extract:**
+- Description structure (opening hook → argument → CTA → timestamps)
+- Title patterns (length, keywords, voice)
+- View counts (what's working recently)
+
+**Every's description templates (from channel analysis):**
+
+**Dan solo monologue (< 10 min):**
+- Declarative opening echoing the video's first line
+- Third-person Dan framing ("Every CEO Dan Shipper...")
+- Core argument as a numbered list
+- "Plus:" teaser for additional value
+- Single Subscribe CTA: "Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe"
+- Product mention if relevant (Proof, Spiral, etc.)
+- Narrative chapter titles (not topic labels)
+
+**Long interview / panel (> 20 min):**
+- Multi-paragraph summary with all speakers named
+- "If you found this episode interesting, please like, subscribe, comment, and share!"
+- "To hear more from Dan Shipper:" block with Subscribe + X links
+- Detailed timestamps with full sentences
+
+## Step 2: Read Transcript
+
+If the user provides a transcript file, read it. If they provide a video file, extract it:
+
+```bash
+# If youtube-transcript-api is available
+python montaigne/fetch_transcript.py VIDEO_ID_OR_URL
+
+# Otherwise ask user to provide transcript or paste key points
+```
+
+From the transcript, identify:
+- The core hook (first 30 seconds)
+- The main framework/argument
+- Key quotable lines (for thumbnail text)
+- Chapter markers (timestamp + topic shifts)
+
+## Step 3: Extract Candidate Frames
+
+```bash
+VIDEO="/path/to/video.mp4"
+mkdir -p /tmp/yt-thumb-frames
+
+# Extract ~20 frames spread across the video (skip first 5s and last 30s)
+for t in 8 12 17 25 40 60 90 120 150 180 210 240 270 300 330 360; do
+  ffmpeg -y -ss $t -i "$VIDEO" -frames:v 1 -update 1 -q:v 1 \
+    "/tmp/yt-thumb-frames/frame_${t}s.jpg" 2>/dev/null
+done
+```
+
+Show the user 6-8 of the best candidates. **Pick frames where:**
+- Eyes are on camera (direct eye contact > looking away)
+- Mouth is natural (not mid-word with weird shapes)
+- Expression matches the hook (confident, surprised, engaged)
+- Hands are visible if gesturing (adds energy)
+- Lighting is flattering
+
+**CRITICAL: Skip frames that are screen recordings, slides, or b-roll.** Only show frames with the speaker's face.
+
+## Step 4: Composite Thumbnail
+
+### The Golden Rule
+
+**NEVER let Gemini (or any AI image model) regenerate a person's face.** AI models subtly distort facial features — warped smiles, uncanny eyes, wrong proportions. The result always looks "off" even if you can't pinpoint why.
+
+**Use PIL/Pillow for ALL text compositing.** The video frame stays pixel-identical to the source.
+
+### When to Use Gemini
+
+ONLY for extending/outpainting backgrounds where no face is present:
+- Extending a wall or backdrop to create more text space
+- Filling in a blank region with matched background texture
+
+Never pass a region containing a face to Gemini for editing.
+
+### Thumbnail Composition
+
+```python
+from PIL import Image, ImageDraw, ImageFont
+import os
+
+FONTS = os.path.expanduser("~/Library/Fonts")
+
+# Load and upscale 2x for crisp 1920x1080 output
+hero = Image.open("/tmp/yt-thumb-frames/frame_17s.jpg").convert("RGB")
+hero = hero.resize((hero.width * 2, hero.height * 2), Image.LANCZOS)
+W, H = hero.size
+
+# Load fonts
+f_headline = ImageFont.truetype(os.path.join(FONTS, "TT Interphases Pro Trial Black.ttf"), int(H * 0.18))
+f_serif = ImageFont.truetype(os.path.join(FONTS, "Lora-VariableFont_wght.ttf"), int(H * 0.04))
+
+# Draw text
+draw = ImageDraw.Draw(hero)
+draw.text((int(W * 0.03), int(H * 0.15)), "HEADLINE", font=f_headline, fill=(17, 17, 17))
+
+# Save at 1920x1080
+hero.resize((1920, 1080), Image.LANCZOS).save("thumb.jpg", quality=94)
+```
+
+### Layout Options
+
+**Text-over-wall:** If the speaker is center-right and there's a light wall on the left, place text directly over the wall. Works when background is light and uniform.
+
+**Parchment/sticker label:** Render text on a cream rounded rectangle, rotate slightly (-3° to -5°), composite over the frame. Good for busy backgrounds (B-roll, themed graphics).
+
+**Bottom band:** Solid cream band across the bottom 20% of the frame. Speaker stays fully visible above. Safe fallback.
+
+**Background extension:** If the speaker is too centered, use Gemini to outpaint more background to the LEFT only. Then composite text on the extended area with PIL.
+
+### Text Styling
+
+- **Headlines:** ALL CAPS, heavy black sans-serif, near-black fill (#111)
+- **Subtext:** Sentence case, thin serif (Lora), dark charcoal (#333)
+- **Eyebrow:** ALL CAPS, letter-spaced, muted red (#B9361F) or charcoal
+- **NO drop shadows, NO outlines, NO gradients on text** (editorial, not MrBeast)
+- Exception: white text with heavy black stroke works on very busy/dark backgrounds
+
+### Iteration
+
+Generate 2-3 layout variants per round. Show the user. Expect 1-3 rounds to nail it. Common feedback:
+- "Text too small" → bump font size 20-30%
+- "Covers his face" → shift text or use background extension
+- "Looks generic" → add tilt, eyebrow label, or themed element
+- "Expression is weird" → swap to a different frame, don't try to fix with AI
+
+## Step 5: Title + Description
+
+### Titles
+
+Generate 5 options. Principles:
+- Under 60 characters (mobile display)
+- Lead with the outcome or transformation, not the topic name
+- Create a curiosity gap — promise something the viewer can't guess
+- **Thumbnail = hook, Title = payoff** (don't repeat the thumbnail text)
+- Avoid jargon only insiders know
+- Every's audience: AI-curious professionals, not pure developers
+
+### Description
+
+Match the template from Step 1 based on video format. Always include:
+- Opening that echoes the video's first spoken line
+- Third-person framing for Dan (or speaker)
+- Subscribe CTA: `Every is the only subscription you need to stay at the edge of AI. Subscribe today: https://every.to/subscribe`
+- Narrative chapter timestamps (derived from transcript)
+- Product links if mentioned in video
+
+Copy the final description to clipboard with `pbcopy`.
+
+## Step 6: Deliver
+
+Save all thumbnail options to `~/Downloads/{video-slug}-thumb-options/`.
+Copy description to clipboard.
+Present title recommendations with rationale.
+
+## Guidelines
+
+- Always read the transcript before generating anything
+- Always pull recent channel data — styles evolve
+- Never fabricate quotes or stats not in the transcript
+- Default to "I'M A/AN [CONCEPT]" for Dan confession-style videos
+- Expect 2-4 thumbnail iterations — that's normal, not failure
+- For themed backgrounds (pirate flag, etc.), use the parchment-label approach — it reads cleanly over busy visuals