BadaBoomBooks - Audiobook Organization Tool

An advanced audiobook organization tool that automatically scrapes metadata from multiple sources and organizes your audiobook collection with proper folder structure, metadata files, and ID3 tags.

🚀 Quick Start

💻 Command Line Interface (Recommended)

# Basic usage - organize audiobooks with automatic search
python BadaBoomBooks.py --auto-search --opf --id3-tag -O "C:\Organized Books" "C:\Audiobook Folder"

# Advanced usage - series organization with all features
python BadaBoomBooks.py --auto-search --series --opf --infotxt --id3-tag --cover --move -O -R T:\Incoming -O T:\Sorted\

# AI assisted sorting with all the futures
python BadaBoomBooks.py --opf --id3-tag --series --cover --move --rename --from-opf --auto-search --llm-select -R T:\Incoming -O T:\Sorted\  

# Dry run to see what would happen
python BadaBoomBooks.py --dry-run --auto-search --series --opf "C:\Audiobook Folder"

🌐 Web Interface (WiP)

# Start the modern web interface
cd web
python start_web.py

# Open browser to http://localhost:5000

✨ Features

🔍 Intelligent Search & Scraping

Multi-site Support: Audible, Goodreads, LubimyCzytac.pl
Automated Search: Browser automation with candidate selection
Manual Search: Clipboard monitoring for manual URL input
Smart Fallbacks: Multiple scraping strategies per site

📁 Advanced Organization

Series Support: Organize by author/series/volume structure
Flexible Output: Copy, move, or in-place processing
Path Cleaning: Automatic filename sanitization
Duplicate Handling: Smart folder deduplication

📋 Metadata Management

OPF Generation: Audiobookshelf-compatible metadata files
Info.txt Creation: SmartAudioBookPlayer summaries
ID3 Tag Updates: Complete audio file tagging
Cover Downloads: High-quality cover art

🎵 Audio Processing

Track Renaming: Standardized "## - Title" format
Folder Flattening: Single-level audio organization
Multi-format Support: MP3, M4A, M4B, FLAC, OGG, WMA
Metadata Embedding: Complete ID3 tag population

🏗️ Architecture

This tool uses a modern modular architecture for maintainability and extensibility:

src/
├── main.py                 # Application orchestrator
├── config.py              # Configuration & constants
├── models.py               # Data structures & validation
├── utils.py                # Utility functions
├── ui/                     # User interface components
├── search/                 # Search & URL handling
├── scrapers/               # Web scraping functionality
└── processors/             # File & metadata processing

See MODULAR_ARCHITECTURE.md for detailed documentation.

📖 Usage

Command Line Arguments

Input/Output Options

folders - Audiobook folder(s) to process
-O, --output - Output directory for organized books
-R, --book-root - Recursively discover audiobook folders from this directory
- Important: Point -R to the parent directory containing author folders, not individual book folders
- When metadata lacks author info, extracts author name from parent directory
- Example: -R "T:\Library\Authors" discovers T:\Library\Authors\Author Name\Book Title\
- Incorrect usage: -R "T:\Library\Authors\Author Name\Book Title" (use direct folder argument instead)

Operation Modes

-c, --copy - Copy folders (preserve originals)
-m, --move - Move folders (delete originals)
-D, --dry-run - Preview changes without modifying files

Processing Options

-f, --flatten - Flatten nested folder structures
-r, --rename - Rename tracks to standard format
-S, --series - Organize by series structure
-I, --id3-tag - Update ID3 tags

Metadata Options

-i, --infotxt - Generate info.txt summaries
-o, --opf - Generate OPF metadata files
-C, --cover - Download cover images
-F, --from-opf - Read existing OPF metadata

Search Options

-s, --site - Specify search site (audible/goodreads/lubimyczytac/all)
--auto-search - Automated search with candidate selection
- Custom URL support: Enter a URL directly during selection instead of choosing numbered options
- Accepts both full URLs (https://lubimyczytac.pl/ksiazka/...) and partial URLs (lubimyczytac.pl/ksiazka/...)
- Validates URL against supported sites before proceeding
--llm-select - AI-powered candidate scoring (requires LLM API key)
--yolo - Auto-accept all prompts for fully automated processing
--search-limit - Results per site (default: 5)
--download-limit - Pages to download per site (default: 3)
--search-delay - Delay between requests (default: 2.0s)

Automated Decision-Making

--auto-search	--llm-select	--yolo	Behavior	User Input?	Threshold	Value	Fallback
❌	-	-	Manual search (open browser, copy URL)	✅	-	-	-
✅	❌	❌	Auto-search, manual selection	✅	-	-	-
✅	❌	✅	⚠️ DISCOURAGED: Blind auto-select (no validation)	❌	-	-	Auto-select first
✅	✅	❌	LLM-assisted selection (scores + default)	✅	Raw LLM	0.5	Show skip default
✅	✅	✅	Two-tier auto-selection	❌	Final weighted	0.95	Skip if < 0.95

Score types:

Raw LLM: AI confidence score (0.0-1.0)
Final weighted: LLM score + site preference boost (LubimyCzytac: 3.0, Audible: 2.0, Goodreads: 1.5)

Thresholds (configurable in src/config.py):

0.5 - Minimum raw LLM score for manual mode default selection
0.95 - Minimum final weighted score for YOLO auto-accept
0.65 - Minimum raw LLM score to apply site weights
0.1 - Similarity bracket for weight tiebreaker

Debug Options

-d, --debug - Enable debug logging
-v, --version - Show version information

Examples

Basic Organization

# Organize single folder with manual search
python BadaBoomBooks.py "C:\My Audiobook"

# Organize multiple folders
python BadaBoomBooks.py "Book1" "Book2" "Book3"

Automated Processing

# Auto-search with series organization
python BadaBoomBooks.py --auto-search --series --move \
  -O "C:\Organized" -R "C:\Incoming"

# Complete processing with all features
python BadaBoomBooks.py --auto-search --series --opf --infotxt \
  --id3-tag --cover --flatten --rename --move \
  -O "T:\Library" "C:\Audiobook"

Dry Run Testing

# See what would happen without making changes
python BadaBoomBooks.py --dry-run --auto-search --series \
  --opf --id3-tag "C:\Test Folder"

📚 Supported Sites

🎧 Audible.com

API Integration: Direct API access for reliable data
Rich Metadata: Series, narrators, publication info
High Quality: Official source data

📖 Goodreads.com

Dual Format Support: Handles old and new page layouts
Comprehensive Data: Reviews, genres, series information
Language Detection: Automatic language identification

🇵🇱 LubimyCzytac.pl

Polish Content: Specialized for Polish audiobooks
Series Parsing: Advanced volume range handling
Original Titles: Tracks translated vs original titles

🔧 Installation

Requirements

pip install -r requirements.txt

Dependencies

requests - HTTP client
beautifulsoup4 - HTML parsing
selenium - Browser automation
tinytag - Audio metadata reading
mutagen - ID3 tag writing
pyperclip - Clipboard monitoring

Browser Setup

Chrome/Chromium required for automated search functionality.

📁 Output Structure

Standard Organization

Output/
├── Author Name/
│   ├── Book Title/
│   │   ├── 01 - Book Title.mp3
│   │   ├── 02 - Book Title.mp3
│   │   ├── metadata.opf
│   │   ├── info.txt
│   │   └── cover.jpg
│   └── Another Book/
└── Another Author/

Series Organization (`--series`)

Output/
├── Author Name/
│   └── Series Name/
│       ├── 1 - First Book/
│       ├── 2 - Second Book/
│       └── 3,4 - Combined Volume/
└── Another Author/

🛠️ Development

Adding New Scrapers

Create scraper class inheriting from BaseScraper
Register in SCRAPER_REGISTRY
Implement required methods
Update imports

See MIGRATION_GUIDE.md for detailed development information.

Testing

# Test individual components
python -m pytest tests/

# Test imports
python -c "from src.main import BadaBoomBooksApp; print('✅ Imports working')"

# Scrapers - Full regression (all samples per service)
python -m pytest src/tests/test_scrapers.py::test_scraper_regression_all_samples[goodreads] -v -s

# TDD workflow
mkdir -p src/tests/data/scrapers/tdd/my-test
# Create metadata.opf with expected values
python -m pytest src/tests/test_scrapers.py::test_manual_tdd_sample -v -s

📜 Legacy Code

Original monolithic code has been archived in legacy/ folder. The new modular architecture maintains full backward compatibility while providing enhanced maintainability.

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Follow the modular architecture patterns
Add tests for new functionality
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Community for providing content and building tools
Contributors to the libraries this project depends on
Users who provided feedback and testing

Happy organizing! 📚🎧

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.claude		.claude
src		src
web_htmx		web_htmx
.env.example		.env.example
.gitignore		.gitignore
BATCH_SCORING.md		BATCH_SCORING.md
BUGFIX_MOVE_METADATA_UPDATE.md		BUGFIX_MOVE_METADATA_UPDATE.md
BadaBoomBooks.py		BadaBoomBooks.py
CLAUDE.md		CLAUDE.md
CLEANUP_SUMMARY.md		CLEANUP_SUMMARY.md
FIX_SUMMARY.md		FIX_SUMMARY.md
GENRE_MAPPING.md		GENRE_MAPPING.md
LLM_PROMPT_IMPROVEMENTS.md		LLM_PROMPT_IMPROVEMENTS.md
METADATA_IMPROVEMENTS.md		METADATA_IMPROVEMENTS.md
MIGRATION_GUIDE.md		MIGRATION_GUIDE.md
MODULAR_ARCHITECTURE.md		MODULAR_ARCHITECTURE.md
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
SCRAPER_WEIGHTS.md		SCRAPER_WEIGHTS.md
SEARCH_CLEANING_FIX.md		SEARCH_CLEANING_FIX.md
TASK_TYPES_QUICK_REF.md		TASK_TYPES_QUICK_REF.md
TESTING_LLM_IMPROVEMENTS.md		TESTING_LLM_IMPROVEMENTS.md
UI_IMPROVEMENTS.md		UI_IMPROVEMENTS.md
USER_INPUT_TASKS.md		USER_INPUT_TASKS.md
WEB_INTERFACE_SUMMARY.md		WEB_INTERFACE_SUMMARY.md
check_queue_status.py		check_queue_status.py
debug_output.txt		debug_output.txt
genre_mapping.json		genre_mapping.json
pytest.ini		pytest.ini
requirements.txt		requirements.txt
template.opf		template.opf
test_cleanup.py		test_cleanup.py
test_final_ui.py		test_final_ui.py
test_infrastructure.py		test_infrastructure.py
test_metadata_cleaning.py		test_metadata_cleaning.py
test_search_cleaning.py		test_search_cleaning.py
test_ui.py		test_ui.py
test_ui_complete.py		test_ui_complete.py

Folders and files

Latest commit

History

Repository files navigation

BadaBoomBooks - Audiobook Organization Tool

🚀 Quick Start

💻 Command Line Interface (Recommended)

🌐 Web Interface (WiP)

✨ Features

🔍 Intelligent Search & Scraping

📁 Advanced Organization

📋 Metadata Management

🎵 Audio Processing

🏗️ Architecture

📖 Usage

Command Line Arguments

Input/Output Options

Operation Modes

Processing Options

Metadata Options

Search Options

Automated Decision-Making

Debug Options

Examples

Basic Organization

Automated Processing

Dry Run Testing

📚 Supported Sites

🎧 Audible.com

📖 Goodreads.com

🇵🇱 LubimyCzytac.pl

🔧 Installation

Requirements

Dependencies

Browser Setup

📁 Output Structure

Standard Organization

Series Organization (--series)

🛠️ Development

Adding New Scrapers

Testing

📜 Legacy Code

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Series Organization (`--series`)

Packages