Skip to content

glubsy/VOD_recovery

Repository files navigation

Various programs to automate downloading Youtube and Twitch VODs, as well as their subtitles.

find_if_id.py

Crawl a directory tree to extract YouTube video IDs from existing video filenames and output them to a text file. This is useful for:

  • Creating a list of videos you already have downloaded
  • Comparing against channel playlists to find missing videos
  • Identifying duplicate videos based on their IDs

The script uses regex pattern matching to extract 11-character YouTube video IDs from filenames, supports multiple video formats (mkv, mp4, opus, m4a, webm), and outputs results to ids_found_on_disk.txt with one ID per line prefixed with "youtube", similar to the archive.txt format generated by ytdlp.

Usage:

python find_if_id.py /path/to/video/directory

Output file format:

youtube VIDEO_ID_1
youtube VIDEO_ID_2
...

The script will also report:

  • Total files found
  • Files that didn't match the expected naming pattern
  • Duplicate videos (same ID appearing multiple times)

ytdl_batch_video_dl.py

Download list of videoIds generated by playboard.py

playboard.py

Use the playboard.co API to get a list of video Ids for a given channel, and a list of deleted video Ids. Their scraped data is useful to figure out which videos have been deleted from the targeted Youtube channel. Their API is now behind paywall as of 2025.

Subs.py

Scan a given directory for Youtube and Twitch video files (looking for Id in file names) and download subtitles for each found video.

usage: subs.py [-h] --mode MODE [--compression ALGO] [--service SERV] [--output-path OUTPATH] [--exclude-regex EXCLUDE] [--remove-compressed] [--cookies COOKIES] [--log-level LOG-LEVEL] PATH

Download subtitles, or compress subtitles already present on disk.

positional arguments:
  PATH                  Path where to look up for files. This can be a directory in which case we will scan for missing subtitle files. If this is a text file, each line holds a videoId that will be downloaded in the current
                        directory.

options:
  -h, --help            show this help message and exit
  --mode MODE           download or compress
  --compression ALGO    Type of compression to use
  --service SERV        Services to scrape for.
  --output-path OUTPATH
                        A directory where to put all downloaded subtitles.
  --exclude-regex EXCLUDE
                        Regex to filter out directories.
  --remove-compressed   Remove subtitle file after compression has succeeded.
  --cookies COOKIES     Path to cookie file to pass to downloaders (for members-only videos).
  --log-level LOG-LEVEL
                        Minimum log level to justify writing to log file on disk.

The expected pattern for both Youtube and Twitch video files is

date [author_name] stream title [resolution width][video Id].extension`

Example of Youtube video file name:

20230726 [Komachi Panko] working at a Haunted Pizzeria [360][uSIOTJLqPLg].mp4

Example of Twitch video file name:

20240422 [vedal987] experimental neuro stream [best][2120650204].mp4

Expected video file extensions are webm, mkv, mp4, m4a, opus.

Usage

Point to the programs expected to do the downloading for us:

export TDCLI="/path/to/TwitchDownloaderCLI"
export YTDL="/path/to/yt-dlp"

Download subtitles for each video Id found in /target and all its subdirectories:

python ./subs.py --mode "download" --remove-compressed --cookies ~/Cookies/cookies.txt /target

The "compress" mode is not very useful, as it is equivalent to calling your preferred compression program on all JSON files, i.e. bzip2 **/*.json. It is kept as convenience in case of a crash mid-process, to rerun the same logic.

Example:

subs.py --mode "download" --remove-compressed --log-level DEBUG --exclude-regex ".*/excluded/.*|.*/another_excluded/.*" /path/to/downloaded_videos

TODO

  • Pass cookies for members-only videos, especially for Twitch. Currently, the downloader has to be called separately with the appropriate argument.
  • Decouple the file collecting logic. This step should be a separate script, and the subs.py script should not scan anything, only accept a list of file paths as input.
  • Youtube and Twitch processing should be split into two separate scripts

About

Various scripts to automate downloading of stream videos

Topics

Resources

Stars

Watchers

Forks

Contributors