Skip to content

chronometer/pdf_processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Processor

A modern web application that processes PDF files, extracts their content, and presents it in a clean format suitable for use as context in LLM prompts. Built with Flask and modern JavaScript.

Python Flask License

Features

  • 📄 Upload and process PDF files up to 50MB
  • 🔍 Extract text content while maintaining formatting
  • 🧹 Clean and format extracted text
  • 📋 Copy extracted content to clipboard with one click
  • 📱 Responsive web interface
  • 🔒 Local processing for privacy
  • 📊 Word count statistics
  • 🌐 Support for all modern browsers
  • ⚡ Real-time upload progress indication

Prerequisites

  • Python 3.6 or higher
  • pip (Python package installer)
  • libmagic (for file type detection)

System Dependencies

macOS

brew install libmagic

Linux (Ubuntu/Debian)

sudo apt-get install libmagic1

Windows

Download and install the latest version of python-magic-bin.

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/pdf-processor.git
cd pdf-processor
  1. Create a virtual environment:
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

  1. Start the server:
python app.py
  1. Open your browser and navigate to:
http://localhost:8080
  1. Use the application:
    • Drag and drop a PDF file or click to browse
    • Wait for the upload and processing to complete
    • View the extracted text
    • Use the "Copy to Clipboard" button to copy the content
    • Check word count statistics

Development

Project Structure

pdf-processor/
├── app.py              # Main Flask application
├── requirements.txt    # Python dependencies
├── static/            # Static files
│   ├── styles.css     # CSS styles
│   └── script.js      # Frontend JavaScript
├── templates/         # HTML templates
│   └── index.html     # Main page template
└── README.md          # Project documentation

Contributing

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/improvement)
  3. Make your changes
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature/improvement)
  6. Create a Pull Request

Security

  • All processing is done locally
  • No files are stored permanently
  • Uploaded files are deleted after processing
  • File type validation before processing
  • Size limit enforcement

Browser Compatibility

  • Chrome (latest)
  • Firefox (latest)
  • Safari (latest)
  • Edge (latest)

Known Issues

  • AirPlay may use port 5000 on macOS (solution: we use port 8080)
  • Large PDFs (>50MB) are not supported to ensure stable performance

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published