An advanced audiobook organization tool that automatically scrapes metadata from multiple sources and organizes your audiobook collection with proper folder structure, metadata files, and ID3 tags.
# Basic usage - organize audiobooks with automatic search
python BadaBoomBooks.py --auto-search --opf --id3-tag -O "C:\Organized Books" "C:\Audiobook Folder"
# Advanced usage - series organization with all features
python BadaBoomBooks.py --auto-search --series --opf --infotxt --id3-tag --cover --move -O -R T:\Incoming -O T:\Sorted\
# AI assisted sorting with all the futures
python BadaBoomBooks.py --opf --id3-tag --series --cover --move --rename --from-opf --auto-search --llm-select -R T:\Incoming -O T:\Sorted\
# Dry run to see what would happen
python BadaBoomBooks.py --dry-run --auto-search --series --opf "C:\Audiobook Folder"# Start the modern web interface
cd web
python start_web.py
# Open browser to http://localhost:5000- Multi-site Support: Audible, Goodreads, LubimyCzytac.pl
- Automated Search: Browser automation with candidate selection
- Manual Search: Clipboard monitoring for manual URL input
- Smart Fallbacks: Multiple scraping strategies per site
- Series Support: Organize by author/series/volume structure
- Flexible Output: Copy, move, or in-place processing
- Path Cleaning: Automatic filename sanitization
- Duplicate Handling: Smart folder deduplication
- OPF Generation: Audiobookshelf-compatible metadata files
- Info.txt Creation: SmartAudioBookPlayer summaries
- ID3 Tag Updates: Complete audio file tagging
- Cover Downloads: High-quality cover art
- Track Renaming: Standardized "## - Title" format
- Folder Flattening: Single-level audio organization
- Multi-format Support: MP3, M4A, M4B, FLAC, OGG, WMA
- Metadata Embedding: Complete ID3 tag population
This tool uses a modern modular architecture for maintainability and extensibility:
src/
├── main.py # Application orchestrator
├── config.py # Configuration & constants
├── models.py # Data structures & validation
├── utils.py # Utility functions
├── ui/ # User interface components
├── search/ # Search & URL handling
├── scrapers/ # Web scraping functionality
└── processors/ # File & metadata processing
See MODULAR_ARCHITECTURE.md for detailed documentation.
folders- Audiobook folder(s) to process-O, --output- Output directory for organized books-R, --book-root- Recursively discover audiobook folders from this directory- Important: Point
-Rto the parent directory containing author folders, not individual book folders - When metadata lacks author info, extracts author name from parent directory
- Example:
-R "T:\Library\Authors"discoversT:\Library\Authors\Author Name\Book Title\ - Incorrect usage:
-R "T:\Library\Authors\Author Name\Book Title"(use direct folder argument instead)
- Important: Point
-c, --copy- Copy folders (preserve originals)-m, --move- Move folders (delete originals)-D, --dry-run- Preview changes without modifying files
-f, --flatten- Flatten nested folder structures-r, --rename- Rename tracks to standard format-S, --series- Organize by series structure-I, --id3-tag- Update ID3 tags
-i, --infotxt- Generate info.txt summaries-o, --opf- Generate OPF metadata files-C, --cover- Download cover images-F, --from-opf- Read existing OPF metadata
-s, --site- Specify search site (audible/goodreads/lubimyczytac/all)--auto-search- Automated search with candidate selection- Custom URL support: Enter a URL directly during selection instead of choosing numbered options
- Accepts both full URLs (
https://lubimyczytac.pl/ksiazka/...) and partial URLs (lubimyczytac.pl/ksiazka/...) - Validates URL against supported sites before proceeding
--llm-select- AI-powered candidate scoring (requires LLM API key)--yolo- Auto-accept all prompts for fully automated processing--search-limit- Results per site (default: 5)--download-limit- Pages to download per site (default: 3)--search-delay- Delay between requests (default: 2.0s)
| --auto-search | --llm-select | --yolo | Behavior | User Input? | Threshold | Value | Fallback |
|---|---|---|---|---|---|---|---|
| ❌ | - | - | Manual search (open browser, copy URL) | ✅ | - | - | - |
| ✅ | ❌ | ❌ | Auto-search, manual selection | ✅ | - | - | - |
| ✅ | ❌ | ✅ | ❌ | - | - | Auto-select first | |
| ✅ | ✅ | ❌ | LLM-assisted selection (scores + default) | ✅ | Raw LLM | 0.5 | Show skip default |
| ✅ | ✅ | ✅ | Two-tier auto-selection | ❌ | Final weighted | 0.95 | Skip if < 0.95 |
Score types:
- Raw LLM: AI confidence score (0.0-1.0)
- Final weighted: LLM score + site preference boost (LubimyCzytac: 3.0, Audible: 2.0, Goodreads: 1.5)
Thresholds (configurable in src/config.py):
0.5- Minimum raw LLM score for manual mode default selection0.95- Minimum final weighted score for YOLO auto-accept0.65- Minimum raw LLM score to apply site weights0.1- Similarity bracket for weight tiebreaker
-d, --debug- Enable debug logging-v, --version- Show version information
# Organize single folder with manual search
python BadaBoomBooks.py "C:\My Audiobook"
# Organize multiple folders
python BadaBoomBooks.py "Book1" "Book2" "Book3"# Auto-search with series organization
python BadaBoomBooks.py --auto-search --series --move \
-O "C:\Organized" -R "C:\Incoming"
# Complete processing with all features
python BadaBoomBooks.py --auto-search --series --opf --infotxt \
--id3-tag --cover --flatten --rename --move \
-O "T:\Library" "C:\Audiobook"# See what would happen without making changes
python BadaBoomBooks.py --dry-run --auto-search --series \
--opf --id3-tag "C:\Test Folder"- API Integration: Direct API access for reliable data
- Rich Metadata: Series, narrators, publication info
- High Quality: Official source data
- Dual Format Support: Handles old and new page layouts
- Comprehensive Data: Reviews, genres, series information
- Language Detection: Automatic language identification
- Polish Content: Specialized for Polish audiobooks
- Series Parsing: Advanced volume range handling
- Original Titles: Tracks translated vs original titles
pip install -r requirements.txtrequests- HTTP clientbeautifulsoup4- HTML parsingselenium- Browser automationtinytag- Audio metadata readingmutagen- ID3 tag writingpyperclip- Clipboard monitoring
Chrome/Chromium required for automated search functionality.
Output/
├── Author Name/
│ ├── Book Title/
│ │ ├── 01 - Book Title.mp3
│ │ ├── 02 - Book Title.mp3
│ │ ├── metadata.opf
│ │ ├── info.txt
│ │ └── cover.jpg
│ └── Another Book/
└── Another Author/
Output/
├── Author Name/
│ └── Series Name/
│ ├── 1 - First Book/
│ ├── 2 - Second Book/
│ └── 3,4 - Combined Volume/
└── Another Author/
- Create scraper class inheriting from
BaseScraper - Register in
SCRAPER_REGISTRY - Implement required methods
- Update imports
See MIGRATION_GUIDE.md for detailed development information.
# Test individual components
python -m pytest tests/
# Test imports
python -c "from src.main import BadaBoomBooksApp; print('✅ Imports working')"
# Scrapers - Full regression (all samples per service)
python -m pytest src/tests/test_scrapers.py::test_scraper_regression_all_samples[goodreads] -v -s
# TDD workflow
mkdir -p src/tests/data/scrapers/tdd/my-test
# Create metadata.opf with expected values
python -m pytest src/tests/test_scrapers.py::test_manual_tdd_sample -v -sOriginal monolithic code has been archived in legacy/ folder. The new modular architecture maintains full backward compatibility while providing enhanced maintainability.
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Follow the modular architecture patterns
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Community for providing content and building tools
- Contributors to the libraries this project depends on
- Users who provided feedback and testing
Happy organizing! 📚🎧