This is an automated Instagram video content reposting system that downloads videos, modifies them to avoid detection, extracts metadata, rewrites captions using AI, and exports everything to CSV for bulk upload.
1. Download Videos (parallel_download.py)
↓
2. Embed Shortcodes in Audio (embed_audio_id.py) [OPTIONAL - for tracking]
↓
3. Spoof Videos (spoof_videos.py)
↓
4. Extract Shortcodes from Spoofed Videos (extract_audio_id.py) [OPTIONAL - if step 2 was used]
↓
5. Generate Final CSV (generate_csv_from_mapping.py)
Purpose: Downloads Instagram videos from scraped data
Input:
a.json- Scraped Instagram data containing:shortCode- Instagram video IDvideoUrl- Direct video download URLinputUrl- Original Instagram profile URLcaption- Original caption text
Output:
- Videos organized in folders:
{username}/{shortcode}.mp4 - Example:
2bears.1cave/DPj6ZAliCHh.mp4
Key Parameters:
MAX_WORKERS = 50- Parallel download threads
How it works:
- Loads
a.jsonwith video metadata - Extracts username from
inputUrlusing regex - Downloads videos using
yt-dlpinto username folders - Skips already downloaded files
- Shows progress with counters
Purpose: Embeds shortcode into video audio using LSB steganography for tracking after spoofing
Input:
a.json- For shortcode/username mapping- Downloaded videos in
{username}/{shortcode}.mp4
Output:
- Embedded videos in
embedded/directory - Same folder structure preserved
How it works:
- Extracts audio from video as WAV (PCM 16-bit)
- Converts shortcode to binary string
- Embeds binary data into Least Significant Bits (LSB) of audio samples
- Adds 32-bit length header and terminator
- Remuxes video with modified audio (AAC 192k)
Key Parameters:
MAX_WORKERS = 10- Parallel processing threads
Note: This is optional - only needed if you want to recover shortcodes from spoofed videos later. Not required if you're directly mapping spoofed videos.
Purpose: Creates modified "spoofed" versions of videos to avoid Instagram's duplicate detection
Input:
- All
.mp4files in the project directory (excludingspoofed/folder) - Can process embedded videos or original downloads
Output:
- Spoofed videos in
spoofed/directory (maintains folder structure) - Filename format:
{original_name}_spoof_{random_suffix}.mp4 spoofed_mapping.json- Maps original paths to spoofed pathsspoof_params.json- Logs all transformation parameters
Spoofing Techniques:
- Cropping: Removes 3-7% width, 2-5% height (center crop)
- Duration Change: Either:
- Trim 3-8% from end (tail-only, start untouched)
- OR extend 3-8% by duplicating last frame (tpad)
- Scaling: Upscale 1.1x to 1.8x (0.1 increments) using Lanczos
- Video Encoding: H.264 NVENC
- Preset: p5 (quality-oriented)
- No B-frames (-bf 0)
- GOP: 250
- Bitrate: 3-17 Mbps (randomized)
- Audio: AAC 128-264 kbps (randomized)
- Metadata Randomization:
- Encoder tag: Lavf58.76.100 / Lavf60.3.100 / Lavf62.6.100
- Creation time: Random date within last 2 years
- Camera model: iPhone/Samsung/Pixel (randomized)
- Level: 3.0 or 3.1
Key Parameters:
MAX_WORKERS = 24- Parallel ffmpeg processesPRESET = "p5"- NVENC preset
Output Example (spoofed_mapping.json):
{
"C:\\Users\\asus\\Desktop\\projects\\reeld\\2bears.1cave\\DPj6ZAliCHh.mp4":
"C:\\Users\\asus\\Desktop\\projects\\reeld\\spoofed\\2bears.1cave\\DPj6ZAliCHh_spoof_a7k2m9.mp4"
}Purpose: Test script - generates 5 spoofed variations of a single video
Input:
- Single video file specified in
INPUT_FILE - Default:
test_embed\DPj6ZAliCHh_embedded.mp4
Output:
- 5 variations in
test_embed\our_variations\ our_variations_params.json- Parameters for each variation
Use Case: Testing spoofing parameters before running full batch
Purpose: Recovers embedded shortcodes from spoofed videos (only needed if you used embed_audio_id.py)
Input:
- Spoofed videos in
spoofed/directory
Output:
spoofed_mapping.json- Maps spoofed video paths to extracted shortcodes
How it works:
- Extracts audio from spoofed video as WAV
- Reads LSB from audio samples
- Decodes 32-bit length header
- Extracts binary data and converts to text (shortcode)
Key Parameters:
MAX_WORKERS = 10
Note: Only needed if you embedded shortcodes in step 2. Otherwise, spoof_videos.py already creates the mapping file.
Purpose: Final step - generates CSV for bulk Instagram upload with rewritten captions
Input:
spoofed_mapping.json- Maps original videos to spoofed videosa.json- Original caption datacaptions_index.json- Prebuilt index of shortcode→caption (auto-generated from a.json)
Output:
final_output.csv- Master CSV with all videosfinal_output_part_001.csv,final_output_part_002.csv, etc. - Chunked CSVs (~4500 rows each)
CSV Format (matches template.csv):
Text,Pinterest Source Url,LinkedIn Group Title,CatalogId(optional),ProductIdsSeparatedByComma(optional),Source,Image/Video link 1Columns:
Text- AI-rewritten caption (3rd person)Pinterest Source Url- EmptyLinkedIn Group Title- EmptyCatalogId(optional)- "catalogId="ProductIdsSeparatedByComma(optional)- "productIds="Source- "Manual"Image/Video link 1- Full path to spoofed video
How it works:
- Loads
spoofed_mapping.json(supports both dict and list formats) - Loads or builds caption index from
a.json - For each spoofed video:
- Extracts shortcode from original filename
- Looks up caption and username from index
- Calls Claude API to rewrite caption in 3rd person
- Generates variant phrasing if same shortcode appears multiple times
- Writes master CSV and chunks
Key Parameters:
CHUNK_SIZE_ROWS = 4500- Rows per chunked CSVUSE_CLAUDE = "1"- Enable/disable AI rewriting (set env var USE_CLAUDE=0 to disable)API_KEY- Anthropic Claude API key (hardcoded)
Caption Rewriting: Uses Claude Sonnet 4 to transform 1st person captions to 3rd person for clip accounts:
- Original: "I can't believe this happened! #amazing"
- Rewritten: "@username shares an incredible moment that left them speechless #amazing"
Purpose: Earlier version of CSV generation (before spoofing workflow was added)
Status: Superseded by generate_csv_from_mapping.py
Differences:
- Uses original video paths instead of spoofed paths
- No support for mapping files
- Limited to first 5 entries (test mode)
Scraped Instagram data with complete video metadata:
[{
"id": "3739088920961884641",
"shortCode": "DPj6ZAliCHh",
"caption": "The best gift you could give Tim...",
"videoUrl": "https://scontent-ord5-3.cdninstagram.com/...",
"inputUrl": "https://instagram.com/2bears.1cave/",
"ownerUsername": "2bears.1cave",
"videoDuration": 20.248,
...
}]Fast lookup index (auto-generated from a.json):
{
"DPj6ZAliCHh": {
"caption": "The best gift you could give Tim...",
"user": "2bears.1cave"
}
}Maps original videos to spoofed versions:
{
"C:\\path\\to\\original.mp4": "C:\\path\\to\\spoofed\\original_spoof_abc123.mp4"
}Or list format:
[{
"input": "test_embed/DPj6ZAliCHh_embedded.mp4",
"output": "test_embed/our_variations/DPj6ZAliCHh_embedded_ourvar_1_6rw2vu.mp4"
}]Detailed transformation parameters for each spoofed video:
[{
"input": "C:\\path\\to\\original.mp4",
"output": "C:\\path\\to\\spoofed.mp4",
"crop_w_pct": 5.2,
"crop_h_pct": 3.8,
"action": "trim",
"trim_pct": 0.0645,
"scale_factor": 1.4,
"v_bitrate_k": 8500,
"a_bitrate_k": 192,
"encoder": "Lavf60.3.100"
}]# Step 1: Download all videos from Instagram
python parallel_download.py
# Output: {username}/{shortcode}.mp4 files
# Step 2: Spoof all videos to avoid detection
python spoof_videos.py
# Output: spoofed/ directory + spoofed_mapping.json
# Step 3: Generate final CSV with rewritten captions
python generate_csv_from_mapping.py
# Output: final_output.csv + chunked CSVs# Step 1: Download videos
python parallel_download.py
# Step 2: Embed shortcodes in audio (for later recovery)
python embed_audio_id.py
# Output: embedded/ directory
# Step 3: Spoof embedded videos
# Edit spoof_videos.py INPUT_BASE to point to embedded/
python spoof_videos.py
# Output: spoofed/ directory
# Step 4: Extract shortcodes from spoofed videos
python extract_audio_id.py
# Output: spoofed_mapping.json (video_path → shortcode)
# Step 5: Generate CSV
python generate_csv_from_mapping.py
# Output: final_output.csv-
Mapping Format Mismatch
spoof_videos.pycreates:{original_path: spoofed_path}extract_audio_id.pycreates:{spoofed_path: shortcode}generate_csv_from_mapping.pyexpects either format- Solution: Code already handles both dict and list formats (lines 95-105)
-
Shortcode Extraction from Filename
generate_csv_from_mapping.pytries to extract shortcode frominput_pathfilename- Uses stem (filename without extension) directly
- Falls back to splitting by
_if not found - Potential Issue: If original filename doesn't contain shortcode, lookup will fail
-
Caption Index Building
- First run builds index from
a.json(slow) - Subsequent runs use cached
captions_index.json(fast) - If
a.jsonupdated, deletecaptions_index.jsonto rebuild
- First run builds index from
-
Verify Mapping File Structure
python -c "import json; print(json.dumps(json.load(open('spoofed_mapping.json')), indent=2)[:500])" -
Test Caption Lookup
- Ensure filenames contain shortcodes
- Or modify
generate_csv_from_mapping.pyto use alternative lookup method
-
Disable AI Rewriting for Testing
set USE_CLAUDE=0 python generate_csv_from_mapping.py -
Check API Rate Limits
- Claude API calls are sequential (not parallel)
- Large batches may hit rate limits
- Consider adding retry logic or delays
reeld/
├── a.json # Scraped Instagram data
├── captions_index.json # Caption lookup index
├── spoofed_mapping.json # Original→Spoofed mapping
├── spoof_params.json # Spoofing parameters log
├── template.csv # CSV format template
├── final_output.csv # Master output CSV
├── final_output_part_001.csv # Chunked CSV
├── parallel_download.py # Step 1: Download
├── embed_audio_id.py # Step 2: Embed (optional)
├── spoof_videos.py # Step 3: Spoof
├── spoof_single.py # Test spoofer
├── extract_audio_id.py # Step 4: Extract (optional)
├── generate_csv_from_mapping.py # Step 5: Generate CSV ⭐
├── process_and_export.py # Legacy CSV generator
├── {username}/ # Downloaded videos
│ └── {shortcode}.mp4
├── embedded/ # Videos with embedded IDs
│ └── {username}/
│ └── {shortcode}.mp4
└── spoofed/ # Spoofed videos
└── {username}/
└── {shortcode}_spoof_*.mp4
Goal: Make each video appear unique to Instagram's duplicate detection system
Techniques:
-
Visual Changes:
- Crop borders (removes frame-matching fingerprints)
- Scale up (changes resolution/pixel values)
-
Temporal Changes:
- Trim or extend duration (breaks timeline matching)
- Only modifies end (keeps important start content)
-
Encoding Changes:
- Randomized bitrates (changes file signature)
- Varied encoder tags (changes container metadata)
- Different GOP sizes and settings
-
Metadata Changes:
- Random creation dates
- Different camera models
- Unique titles/comments
Result: Each spoofed video has different:
- File hash
- Video fingerprint
- Duration
- Resolution
- Metadata
- Bitrate signature
Concept: Hide data in least significant bit of audio samples (imperceptible to human ear)
Format:
[32-bit length][data bits][terminator: 11111111]
Example:
- Shortcode: "DPj6ZAliCHh"
- Binary: 01000100 01010000... (88 bits)
- Header: 00000000 00000000 00000000 01011000 (88 in binary)
- Total: 32 + 88 + 8 = 128 bits embedded
Capacity: ~1 bit per audio sample
- 44.1kHz stereo WAV: 88,200 samples/sec
- 20-second video: 1,764,000 bits capacity
- Shortcode: ~88 bits needed
- Plenty of headroom
- Shortcode not found in
captions_index.json - Check if filename contains correct shortcode
- Verify
a.jsonhas entry for that shortcode - Delete
captions_index.jsonand regenerate
- API key invalid or rate limited
- Network connectivity issue
- Set
USE_CLAUDE=0to skip AI rewriting
- Ensure FFmpeg installed with NVENC support
- Check NVIDIA GPU available (for h264_nvenc)
- Fallback: change codec to libx264 (slower, CPU-based)
generate_csv_from_mapping.pysupports both dict and list formats- Check file with:
python -c "import json; print(type(json.load(open('spoofed_mapping.json'))))"
- parallel_download.py: 50 workers = ~50 simultaneous downloads
- spoof_videos.py: 24 workers = 24 parallel FFmpeg encodes (GPU-limited)
- embed_audio_id.py: 10 workers = 10 parallel processes (CPU-limited)
- generate_csv_from_mapping.py: Sequential API calls (rate-limit friendly)
Bottlenecks:
- Download: Network bandwidth
- Spoofing: GPU encoding (NVENC)
- Embedding: CPU + FFmpeg
- CSV generation: Claude API rate limits
- Batch API Calls: Use Claude's batch API for caption rewriting
- Error Recovery: Resume partial runs without reprocessing
- Quality Control: Verify spoofed videos play correctly
- Duplicate Detection: Check for already-posted content
- Analytics: Track which spoofing parameters work best
- Auto-Upload: Direct integration with Instagram API
- API key is hardcoded in scripts (consider using environment variables)
- Videos are stored locally (ensure sufficient disk space)
- Respect Instagram's Terms of Service
- Copyright considerations for reposting content
Required:
- Python 3.7+
- FFmpeg (with NVENC support for GPU encoding)
- yt-dlp (for Instagram downloads)
- Python packages:
- anthropic (Claude API)
- numpy (audio processing)
- concurrent.futures (parallel processing)
Installation:
pip install anthropic numpy
# FFmpeg: download from ffmpeg.org
# yt-dlp: pip install yt-dlpThis is a sophisticated content repurposing pipeline that:
- ✅ Downloads Instagram videos with metadata
- ✅ (Optional) Embeds tracking IDs in audio
- ✅ Creates undetectable spoofed variations
- ✅ (Optional) Recovers tracking IDs after spoofing
- ✅ Rewrites captions using AI
- ✅ Exports to bulk upload CSV format
Current Status: All core components working. Main script is generate_csv_from_mapping.py for the final CSV generation step.
Primary Use Case: Automated Instagram clips/reels reposting account with AI-generated captions.
The create_va_chunks.py script creates smaller chunk packages for distribution to VAs, containing a subset of videos and corresponding CSV data.
Purpose: Create mini chunks with N videos each for VA distribution
Configuration:
num_vas = 5- Number of VA chunks to createvideos_per_va = 3- Videos per chunk
Input:
chunk_01b.csv,chunk_01c.csv- Source CSV files with video/caption data
Output:
va_chunk_01/throughva_chunk_05/directories- Each contains: videos +
va_chunk_XX.csv
How it works:
- Reads source CSVs with
csv.reader(handles multiline captions) - Groups videos by creator (extracts from video path)
- Round-robin selection across creators for variety
- Copies videos to VA chunk directories
- Writes CSV with manual formatting to match source format exactly
Problem: CSV files generated by create_va_chunks.py do not parse correctly in Google Sheets - all data appears in a single column instead of being split across columns.
Investigation Summary:
- Source CSVs (chunk_01b.csv) work correctly in Google Sheets
- Output CSVs appear byte-for-byte identical in format
- Both use: UTF-8 encoding, CRLF line endings, same quoting rules
- Tested multiple approaches:
csv.DictWriter- doesn't workcsv.writer- doesn't work- Manual line copying - doesn't work (breaks multiline fields)
- Manual CSV formatting with explicit quoting rules - doesn't work
- Preserving full file paths vs relative paths - no difference
Theories:
- Something about how the original CSVs were generated (from JSON parsing) that isn't captured
- Google Sheets may have inconsistent behavior with small files vs large files
- Possible invisible character or encoding difference not detected in byte comparison
Workaround:
- Manually copy rows from working chunk CSV files
- Or import VA chunk CSV and manually set delimiter to comma in Google Sheets
Files:
- Working source:
chunk_01b.csv,chunk_01c.csv - Script:
create_va_chunks.py - Output:
va_chunk_01/throughva_chunk_05/