Skip to content

Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient processing for low-resource systems.

License

Notifications You must be signed in to change notification settings

mateogon/pdf-narrator

Repository files navigation

PDF Narrator

Updated for Kokoro v1.0!
Now setting up is easier—simply install the required Python dependencies (including the updated Kokoro package) and run the app. No more manual downloads or moving model files into specific folders.

PDF Narrator (Kokoro Edition) transforms your PDF and EPUB documents into audiobooks effortlessly using advanced text extraction and Kokoro TTS technology. With Kokoro v1.0, the integration is seamless and the setup is as simple as installing the requirements and running the application.


Demo

  1. Screenshot
    Check out the GUI in the screenshot below:
    Demo Screenshot

  2. Audio Sample
    Listen to a short sample of the generated audiobook:

    • af_sky
test_af_sky.mp4
  • am_liam
test_am_liam.mp4
  • af_heart
test_af_heart.mp4

Features

  • Intelligent Text Extraction

    • Supports both PDF and EPUB formats.
    • For PDFs: Skips headers, footers, and page numbers; optionally splits based on Table of Contents (TOC).
    • For EPUBs: Extracts chapters based on internal HTML structure.
  • Kokoro TTS Integration

    • Generate natural-sounding audiobooks with the updated Kokoro v1.0 model.
    • Easily select different voicepacks.
  • User-Friendly GUI

    • Modern interface built with ttkbootstrap (theme selector, scrolled logs, progress bars).
    • Pause/resume and cancel your audiobook generation anytime.
  • Configurable for Low-VRAM Systems

    • Choose the chunk size for text to accommodate limited GPU resources.
    • Switch to CPU if no GPU is available.
  • Voice Testing Made Simple

    • Test a single voice or loop through all available voices directly from the GUI.
    • Use the dedicated Voice Test tab to generate and listen to sample audio files.

Prerequisites

  • Python 3.8+
  • FFmpeg (for audio-related tasks on some systems)
  • Torch (PyTorch for the Kokoro TTS model)
  • Other dependencies as listed in requirements.txt

Installation

  1. Clone the Repository

    git clone https://github.com/mateogon/pdf-narrator.git
    cd pdf-narrator
  2. Create and Activate a Virtual Environment

    python -m venv venv
    # On Linux/macOS:
    source venv/bin/activate
    # On Windows:
    venv\Scripts\activate
  3. Install Python Dependencies

    pip install --upgrade pip
    pip install -r requirements.txt

Windows Additional Installation Notes

For Windows users, some libraries may require extra steps:

1. Prerequisites

  • Python 3.12.7
    Download and install Python 3.12.7. Ensure python and pip are added to your system's PATH.

  • CUDA 12.4 (for GPU acceleration)
    Install the CUDA 12.4 Toolkit if you plan to use GPU acceleration.

2. Installing eSpeak NG

eSpeak NG is required for phoneme-based operations.

  1. Download the Installer
    eSpeak NG X64 Installer

  2. Run the Installer
    Follow the on-screen instructions.

  3. Set Environment Variables
    Add the following environment variables:

    • PHONEMIZER_ESPEAK_LIBRARYC:\Program Files\eSpeak NG\libespeak-ng.dll
    • PHONEMIZER_ESPEAK_PATHC:\Program Files (x86)\eSpeak\command_line\espeak.exe

    (Right-click "This PC" → Properties → Advanced system settings → Environment Variables)

  4. Verify Installation

    Open Command Prompt and run:

    espeak-ng --version

3. Using Precompiled Wheels for DeepSpeed and lxml

  1. Download Wheels

  2. Install the Wheels

    Activate your virtual environment and run:

    pip install path\to\deepspeed-0.11.2+cuda124-cp312-cp312-win_amd64.whl
    pip install path\to\lxml-5.3.0-cp312-cp312-win_amd64.whl
  3. Verify Installation

    deepspeed --version
    pip show lxml
    espeak-ng --version

Quick Start

  1. Launch the App

    python main.py
  2. Select a Mode

    • Single Book: Choose a PDF or EPUB file and extract its text.
    • Batch Books: Select a folder with multiple PDFs and/or EPUBs (processes all, preserving folder structure).
    • Skip Extraction: Use pre-extracted text files.
  3. Extract Text

    • PDFs: Splits into chapters if TOC is available; otherwise, extracts entire document.
    • EPUBs: Extracts chapters based on internal structure.
  4. Configure Kokoro TTS Settings

    • Select a voice.
    • Adjust chunk size and output format (.wav or .mp3).
  5. Generate Audiobook

    • Click Start Process and monitor progress.
    • Find your audio files in the output folder.

Testing Voices

With the latest update, you can now quickly test voices directly within the app:

  1. Navigate to the Voice Test Tab
    In the main window, click on the Voice Test tab to access voice testing features.

  2. Enter Sample Text
    Modify or use the default sample text in the provided text area.

  3. Select Test Mode

    • Test Single Voice: Pick one voice from the dropdown to generate a sample.
    • Test All Available Voices: Choose this option to automatically generate samples for every available voice.
  4. Run the Test
    Click Run Voice Test to start. The app creates a temporary file with your sample text, processes it through Kokoro TTS for each voice (if testing all voices), and saves the output audio files in the designated folder.

  5. Monitor Progress and Listen
    A progress bar and status labels keep you updated. Once the test is complete, you’ll be prompted to open the output folder where you can listen to the generated samples.

  6. Stop the Test if Needed
    If you need to cancel while testing, click Stop Test to interrupt the process.


Technical Highlights

  • Text Extraction

    • PDF: Built on PyMuPDF for efficient parsing, with TOC-based splitting.
    • EPUB: Extracts text from HTML content files within the EPUB structure.
  • Kokoro TTS

    • Advanced text normalization and phonemization.
    • Splits text into chunks (<510 tokens) and joins audio outputs.

Contributing

Fork the repository, create a branch, and submit a pull request.
Report bugs or suggest features via Issues.


License

This project is released under the MIT License (LICENSE.md).


Enjoy converting your PDFs and EPUBs into immersive audiobooks with Kokoro v1.0 TTS—and now, easily test voices to pick your perfect sound!

About

Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficient processing for low-resource systems.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors 2

  •  
  •  

Languages