find_if_id.py

Various programs to automate downloading Youtube and Twitch VODs, as well as their subtitles.

find_if_id.py

Crawl a directory tree to extract YouTube video IDs from existing video filenames and output them to a text file. This is useful for:

Creating a list of videos you already have downloaded
Comparing against channel playlists to find missing videos
Identifying duplicate videos based on their IDs

The script uses regex pattern matching to extract 11-character YouTube video IDs from filenames, supports multiple video formats (mkv, mp4, opus, m4a, webm), and outputs results to ids_found_on_disk.txt with one ID per line prefixed with "youtube", similar to the archive.txt format generated by ytdlp.

Usage:

python find_if_id.py /path/to/video/directory

Output file format:

youtube VIDEO_ID_1
youtube VIDEO_ID_2
...

The script will also report:

Total files found
Files that didn't match the expected naming pattern
Duplicate videos (same ID appearing multiple times)

ytdl_batch_video_dl.py

Download list of videoIds generated by playboard.py

playboard.py

Use the playboard.co API to get a list of video Ids for a given channel, and a list of deleted video Ids. Their scraped data is useful to figure out which videos have been deleted from the targeted Youtube channel. Their API is now behind paywall as of 2025.

Subs.py

Scan a given directory for Youtube and Twitch video files (looking for Id in file names) and download subtitles for each found video.

usage: subs.py [-h] --mode MODE [--compression ALGO] [--service SERV] [--output-path OUTPATH] [--exclude-regex EXCLUDE] [--remove-compressed] [--cookies COOKIES] [--log-level LOG-LEVEL] PATH

Download subtitles, or compress subtitles already present on disk.

positional arguments:
  PATH                  Path where to look up for files. This can be a directory in which case we will scan for missing subtitle files. If this is a text file, each line holds a videoId that will be downloaded in the current
                        directory.

options:
  -h, --help            show this help message and exit
  --mode MODE           download or compress
  --compression ALGO    Type of compression to use
  --service SERV        Services to scrape for.
  --output-path OUTPATH
                        A directory where to put all downloaded subtitles.
  --exclude-regex EXCLUDE
                        Regex to filter out directories.
  --remove-compressed   Remove subtitle file after compression has succeeded.
  --cookies COOKIES     Path to cookie file to pass to downloaders (for members-only videos).
  --log-level LOG-LEVEL
                        Minimum log level to justify writing to log file on disk.

The expected pattern for both Youtube and Twitch video files is

date [author_name] stream title [resolution width][video Id].extension`

Example of Youtube video file name:

20230726 [Komachi Panko] working at a Haunted Pizzeria [360][uSIOTJLqPLg].mp4

Example of Twitch video file name:

20240422 [vedal987] experimental neuro stream [best][2120650204].mp4

Expected video file extensions are webm, mkv, mp4, m4a, opus.

Usage

Point to the programs expected to do the downloading for us:

export TDCLI="/path/to/TwitchDownloaderCLI"
export YTDL="/path/to/yt-dlp"

Download subtitles for each video Id found in /target and all its subdirectories:

python ./subs.py --mode "download" --remove-compressed --cookies ~/Cookies/cookies.txt /target

The "compress" mode is not very useful, as it is equivalent to calling your preferred compression program on all JSON files, i.e. bzip2 **/*.json. It is kept as convenience in case of a crash mid-process, to rerun the same logic.

Example:

subs.py --mode "download" --remove-compressed --log-level DEBUG --exclude-regex ".*/excluded/.*|.*/another_excluded/.*" /path/to/downloaded_videos

TODO

Pass cookies for members-only videos, especially for Twitch. Currently, the downloader has to be called separately with the appropriate argument.
Decouple the file collecting logic. This step should be a separate script, and the subs.py script should not scan anything, only accept a list of file paths as input.
Youtube and Twitch processing should be split into two separate scripts

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
config		config
downloader		downloader
rsync_based		rsync_based
tests		tests
transcode_move		transcode_move
twitch		twitch
twitch_download		twitch_download
youtube		youtube
.gitignore		.gitignore
BATCH_SCRIPT_README.md		BATCH_SCRIPT_README.md
README.md		README.md
__init__.py		__init__.py
batch_rename_vods.sh		batch_rename_vods.sh
compare_lists.py		compare_lists.py
constants.py		constants.py
find_if_id.py		find_if_id.py
playboard.py		playboard.py
regex.py		regex.py
requirements.txt		requirements.txt
subs.py		subs.py
ytdl_batch_video_dl.py		ytdl_batch_video_dl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

find_if_id.py

ytdl_batch_video_dl.py

playboard.py

Subs.py

Usage

TODO

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

find_if_id.py

ytdl_batch_video_dl.py

playboard.py

Subs.py

Usage

TODO

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages