A powerful, easy-to-use application for all your PDF needs. Convert, edit, merge, split, compress, protect, and transform PDF documents with just a few clicks!
- What is PDF Tools?
- Features
- Quick Start Guide
- Installation
- How to Use
- API Documentation
- Project Structure
- Configuration
- Troubleshooting
- Contributing
- License
βββββββββββββββββββ
β Clients β
β (Browser / API) β
ββββββββββ¬βββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Flask App β β FastAPI β
β (Web UI) β β (REST API) β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ
β β
βββββββββββββββββββββ¬ββββββββββββββββββββ
βΌ
βββββββββββββββββββββββ
β Feature Manager β
β (Blueprints) β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββΌββββββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β PDF Processing β β Auth & User β β File Management β
β Features β β Features β β Features β
ββββββββββ¬ββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β PDF Processor β
β (Mixin Classes) β
βββββββββββββββββββββββ€
β β’ Edit (merge/split)β
β β’ Security (encrypt)β
β β’ Transform (rotate)β
β β’ Convert (formats) β
ββββββββββββ¬βββββββββββ
β
ββββββββββββββββΌβββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β Celery β β PostgreSQL β β File Systemβ
β + Redis β β Database β β Storage β
β (Tasks) β β (Users) β β (PDFs) β
ββββββββββββββ ββββββββββββββ ββββββββββββββ
Key Components:
- Flask App: Web interface with HTML templates
- FastAPI: REST API for programmatic access
- Feature Manager: Organizes routes into modular blueprints
- PDF Processor: Core processing engine using mixin architecture
- Celery + Redis: Background task processing for large files
- PostgreSQL: User data, file records, and conversion history
- File System: Uploaded and processed PDF storage
PDF Tools is an all-in-one solution for working with PDF files. Whether you need to:
- π Convert a PDF to Word for editing
- π Merge multiple PDFs into one document
- βοΈ Split a large PDF into smaller files
- π Add password protection to sensitive documents
- π§ Add watermarks to your PDFs
- ποΈ Compress PDFs to reduce file size
...this tool has you covered! It works through a web browser (no installation needed on user devices), provides a REST API for developers, and can also be used via command line.
Convert PDFs to and from various formats:
| From PDF To | From Format To PDF |
|---|---|
| β Word (DOCX) | β Word (DOCX) |
| β Excel (XLSX) | β Excel (XLSX) |
| β PowerPoint (PPTX) | β PowerPoint (PPTX) |
| β HTML | β HTML |
| β Images (PNG, JPG) | β Images |
| β Plain Text | β Jupyter Notebooks |
| Feature | Description |
|---|---|
| π Merge PDFs | Combine multiple PDFs into a single document |
| βοΈ Split PDF | Separate a PDF into individual pages or custom ranges |
| ποΈ Compress PDF | Reduce file size while maintaining quality |
| π Extract Pages | Pull out specific pages from a PDF |
| π Rotate Pages | Rotate pages 90Β°, 180Β°, or 270Β° |
| π Extract Text | Get all text content from a PDF |
| πΌοΈ Extract Images | Save all images from a PDF |
| π§ Repair PDF | Fix corrupted or damaged PDF files |
| π Compare PDFs | Find differences between two PDF files |
| Feature | Description |
|---|---|
| π Password Protection | Add open/edit passwords to PDFs |
| π Remove Password | Unlock password-protected PDFs (with authorization) |
| π§ Watermarks | Add text or image watermarks |
| βοΈ Digital Signatures | Add signature stamps to documents |
| π Encryption | Secure PDFs with AES-256 encryption |
| Feature | Description |
|---|---|
| π OCR (Optical Character Recognition) | Extract text from scanned documents |
| π Batch Processing | Process multiple files at once |
| π Progress Tracking | Monitor conversion progress in real-time |
| π Background Tasks | Long operations run in the background |
Get up and running in 5 minutes!
Make sure you have these installed:
- β Python 3.10 or higher - Download Python
- β PostgreSQL - Download PostgreSQL
- β Redis (for background tasks) - Download Redis
- β Git - Download Git
# 1. Clone the repository
git clone https://github.com/9046balaji/Pdf-Tools.git
cd Pdf-Tools
# 2. Create a virtual environment
python -m venv venv
# 3. Activate the virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# 4. Install dependencies
pip install -r requirements.txtCreate a .env file in the project root:
# Required Settings
SECRET_KEY=your-super-secret-key-change-this-in-production
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_password
DB_NAME=pdf_tools
# File Settings
UPLOAD_FOLDER=uploads
PROCESSED_FOLDER=processed
MAX_CONTENT_LENGTH=1073741824
# Redis (for background tasks)
REDIS_URL=redis://localhost:6379/0python
>>> from app import create_app
>>> from database.db import db
>>> app = create_app()
>>> with app.app_context():
... db.create_all()
>>> exit()python run.pyπ Done! Open your browser and go to: http://localhost:5000
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.10 | 3.11+ |
| PostgreSQL | 13 | 15+ |
| RAM | 2 GB | 4 GB+ |
| Disk Space | 500 MB | 2 GB+ |
| OS | Windows 10, macOS 10.15, Ubuntu 20.04 | Latest versions |
| Tool | Purpose | Download |
|---|---|---|
| Tesseract | OCR functionality | Installation Guide |
| Poppler | PDF to image conversion | Download |
| Ghostscript | Advanced PDF compression | Download |
| Pandoc | Document format conversion | Download |
| LibreOffice | Office document conversion | Download |
# 1. Install Python from python.org (check "Add to PATH")
# 2. Clone and setup
git clone https://github.com/9046balaji/Pdf-Tools.git
cd Pdf-Tools
# 3. Create virtual environment
python -m venv venv
.\venv\Scripts\Activate
# 4. Install dependencies
pip install -r requirements.txt
# 5. Run application
python run.py# 1. Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# 2. Install Python and dependencies
brew install [email protected] postgresql redis
# 3. Clone and setup
git clone https://github.com/9046balaji/Pdf-Tools.git
cd Pdf-Tools
# 4. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 5. Install dependencies
pip install -r requirements.txt
# 6. Run application
python run.py# 1. Install system dependencies
sudo apt update
sudo apt install python3.11 python3.11-venv python3-pip postgresql redis-server
# 2. Clone and setup
git clone https://github.com/9046balaji/Pdf-Tools.git
cd Pdf-Tools
# 3. Create virtual environment
python3.11 -m venv venv
source venv/bin/activate
# 4. Install dependencies
pip install -r requirements.txt
# 5. Run application
python run.py- Start the application: Run
python run.py - Open your browser: Go to
http://localhost:5000 - Choose an operation: Click on what you want to do (Convert, Merge, Split, etc.)
- Upload your file(s): Drag and drop or click to select files
- Configure options: Adjust settings if needed
- Process: Click the action button
- Download: Save your processed file
# Example: Convert PDF to Word
from pdf_modules.pdf_word_conversion import PDFWordConversionMixin
class PDFConverter(PDFWordConversionMixin):
pass
converter = PDFConverter()
result = converter.pdf_to_docx(
input_path="document.pdf",
output_path="document.docx"
)
print(f"Conversion successful: {result}")# Example: Merge multiple PDFs
from pdf_modules.pdf_edit import PDFEditMixin
class PDFEditor(PDFEditMixin):
pass
editor = PDFEditor()
result = editor.merge_pdfs(
pdf_files=["file1.pdf", "file2.pdf", "file3.pdf"],
output_path="merged_document.pdf"
)
print(f"Merged {result.files_merged} files into {result.output_path}")# Example: Add password protection
from pdf_modules.pdf_security import PDFSecurityMixin
class PDFSecurity(PDFSecurityMixin):
pass
security = PDFSecurity()
security.protect_pdf(
input_path="document.pdf",
output_path="protected_document.pdf",
password="your-secure-password"
)
print("PDF protected successfully!")The application provides a full REST API for integration with other systems.
Base URL: http://localhost:5000/api
curl -X POST http://localhost:5000/api/pdf/convert-to-word \
-F "[email protected]" \
-H "Content-Type: multipart/form-data"curl -X POST http://localhost:5000/api/pdf/merge \
-F "[email protected]" \
-F "[email protected]" \
-F "[email protected]"curl -X POST http://localhost:5000/api/pdf/protect \
-F "[email protected]" \
-F "password=your-secure-password"curl -X POST http://localhost:5000/api/pdf/compress \
-F "file=@large_document.pdf" \
-F "quality=balanced"Once the application is running, access the interactive API documentation:
- Swagger UI: http://localhost:5000/docs
- ReDoc: http://localhost:5000/redoc
- OpenAPI JSON: http://localhost:5000/openapi.json
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/pdf/convert-to-word |
Convert PDF to Word (DOCX) |
| POST | /api/pdf/convert-to-excel |
Convert PDF to Excel (XLSX) |
| POST | /api/pdf/convert-to-pptx |
Convert PDF to PowerPoint |
| POST | /api/pdf/convert-to-html |
Convert PDF to HTML |
| POST | /api/pdf/convert-to-text |
Extract text from PDF |
| POST | /api/pdf/convert-to-images |
Convert PDF to images |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/pdf/merge |
Merge multiple PDFs |
| POST | /api/pdf/split |
Split PDF into pages |
| POST | /api/pdf/compress |
Compress PDF file |
| POST | /api/pdf/rotate |
Rotate PDF pages |
| GET | /api/pdf/extract-images |
Extract images from PDF |
| POST | /api/pdf/extract-pages |
Extract specific pages |
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/pdf/protect |
Add password protection |
| POST | /api/pdf/unlock |
Remove password (with auth) |
| POST | /api/pdf/watermark |
Add watermark |
| POST | /api/pdf/sign |
Add signature stamp |
| Method | Endpoint | Description |
|---|---|---|
| GET | /tasks/{task_id} |
Get task status |
| GET | /tasks/{task_id}/result |
Get task result |
| GET | /health |
Health check |
Success Response:
{
"status": "success",
"message": "Operation completed successfully",
"task_id": "conv_abc123def456",
"result": {
"output_file": "output.docx",
"file_size": 1024000,
"pages_processed": 10
}
}Error Response:
{
"status": "error",
"message": "Invalid PDF file",
"error_code": "INVALID_PDF",
"details": {
"reason": "File is corrupted or not a valid PDF"
}
}Pdf-Tools/
β
βββ π Entry Points
β βββ run.py # Application entry point
β βββ app.py # Flask application factory
β βββ api.py # FastAPI endpoints
β
βββ βοΈ Configuration
β βββ config.py # Application settings
β βββ .env # Environment variables (create this)
β βββ .env.example # Example configuration
β
βββ π Core PDF Modules
β βββ pdf_modules/
β βββ pdf_base.py # Base PDF operations
β βββ pdf_convert.py # Format conversions
β βββ pdf_edit.py # Merge, split, extract
β βββ pdf_transform.py # Rotate, compress
β βββ pdf_security.py # Passwords, watermarks
β βββ pdf_validation.py # File validation
β βββ pdf_repair.py # Fix corrupted PDFs
β βββ ... # More specialized modules
β
βββ π― Features
β βββ Feature/
β βββ admin_features.py # Admin operations
β βββ authentication_features.py # User login/logout
β βββ conversion_features.py # Conversion routes
β βββ file_management_features.py # File handling
β βββ feature_manager.py # Feature orchestration
β
βββ π§ Utilities
β βββ common/
β βββ file_validation.py # File checks
β βββ error_recovery.py # Error handling
β βββ health_check.py # System monitoring
β βββ progress.py # Progress tracking
β βββ upload_handler.py # Upload management
β
βββ ποΈ Database
β βββ database/
β βββ db.py # Database connection
β βββ models.py # Data models
β
βββ π¨ Frontend
β βββ static/ # CSS, JavaScript, images
β βββ templates/ # HTML templates
β
βββ π File Storage
β βββ uploads/ # Uploaded files (auto-created)
β βββ processed/ # Output files (auto-created)
β
βββ π§ͺ Tests
β βββ tests/ # Test files
β
βββ π Documentation
βββ README.md # This file
βββ LICENSE # MIT License
βββ requirements.txt # Python dependencies
Create a .env file in the project root with these settings:
# Security - CHANGE THIS IN PRODUCTION!
SECRET_KEY=your-very-long-random-secret-key-at-least-32-characters
# Database Connection
DB_HOST=localhost
DB_PORT=5432
DB_USER=postgres
DB_PASSWORD=your_database_password
DB_NAME=pdf_tools# Application Mode
FLASK_ENV=development # Use 'production' for live deployments
FLASK_DEBUG=1 # Set to 0 in production
# File Handling
UPLOAD_FOLDER=uploads # Where uploads are stored
PROCESSED_FOLDER=processed # Where output files are saved
MAX_CONTENT_LENGTH=1073741824 # Max file size (1GB default)
# Redis (for background tasks)
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/0
# Email Notifications (optional)
MAIL_SERVER=smtp.gmail.com
MAIL_PORT=587
MAIL_USE_TLS=True
MAIL_USERNAME=[email protected]
MAIL_PASSWORD=your-app-password
# Session Settings
SESSION_COOKIE_SECURE=True # Enable in production with HTTPSFor better performance with large files, enable background processing:
# Terminal 1: Start Redis (if not running)
redis-server
# Terminal 2: Start Celery worker
celery -A tasks worker --loglevel=info
# Terminal 3: Run the application
python run.pyProblem: A required Python package is missing.
Solution:
# Make sure virtual environment is activated
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# Install all dependencies
pip install -r requirements.txtProblem: PostgreSQL is not running or credentials are wrong.
Solution:
- Check PostgreSQL is running:
# Windows pg_isready # macOS/Linux sudo systemctl status postgresql
- Verify your
.envfile has correct credentials - Make sure the database exists:
psql -U postgres -c "CREATE DATABASE pdf_tools;"
Problem: Another application is using port 5000.
Solution: Change the port in your .env file:
FLASK_PORT=5001Or run with a different port:
python run.py --port 5001Problem: The uploaded file exceeds the size limit.
Solution: Increase the limit in .env:
MAX_CONTENT_LENGTH=2147483648 # 2GBProblem: The PDF file might be corrupted or protected.
Solution:
- Check if the PDF opens in a regular PDF viewer
- Try the repair function first
- Check if the PDF is password-protected
- Look at the logs for detailed error messages:
# View application logs cat logs/app.log
Problem: Tesseract OCR is not installed.
Solution: Install Tesseract:
# Windows (using Chocolatey)
choco install tesseract
# macOS
brew install tesseract
# Ubuntu/Debian
sudo apt install tesseract-ocr- Check the logs: Look in the
logs/folder for detailed error messages - Enable debug mode: Set
FLASK_DEBUG=1in your.envfile - Search issues: Check GitHub Issues
- Create an issue: If you can't find a solution, create a new issue with:
- Your operating system
- Python version (
python --version) - Complete error message
- Steps to reproduce
We welcome contributions! Here's how you can help:
- Fork the repository on GitHub
- Clone your fork:
git clone https://github.com/your-username/Pdf-Tools.git
- Create a branch for your feature:
git checkout -b feature/your-amazing-feature
- Make your changes and test them
- Commit your changes:
git commit -m "Add: Your feature description" - Push to your fork:
git push origin feature/your-amazing-feature
- Create a Pull Request on GitHub
- Follow PEP 8 style guide for Python
- Add docstrings to functions and classes
- Write meaningful commit messages
- Add tests for new features
- Update documentation as needed
- π Bug fixes
- β¨ New features
- π Documentation improvements
- π§ͺ Tests
- π Translations
- π‘ Ideas and suggestions
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2025 PDF Tools
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software...
Special thanks to:
- All the contributors who help improve this project
- The open-source community for the amazing libraries used
- Everyone who reports bugs and suggests features
| Category | Technologies |
|---|---|
| Backend | Python 3.10+, Flask, FastAPI |
| Database | PostgreSQL, SQLAlchemy |
| Task Queue | Celery, Redis |
| PDF Processing | PyMuPDF, pypdf, pikepdf, reportlab |
| Conversion | pdf2docx, python-docx, python-pptx, openpyxl |
| OCR | Tesseract, ocrmypdf |
| Monitoring | OpenTelemetry, Prometheus |
- π³ Docker support
- βοΈ Cloud storage integration (AWS S3, Google Cloud)
- π€ AI-powered document analysis
- π± Mobile-friendly interface
- π Real-time collaboration
- π§ Email integration
Made with β€οΈ for PDF lovers worldwide
β Star this repo | π Report Bug | π‘ Request Feature
Last Updated: January 2026