A powerful OCR tool for detecting and annotating phrases in images using Google Cloud Vision API with fuzzy matching support.
- 🔍 Multi-orientation text detection - Handles horizontal, vertical, upside-down, and diagonal text
- 🎯 Fuzzy phrase matching - Finds phrases even with OCR errors or variations
- 📦 Spanning detection - Matches phrases that span multiple lines
- 🎨 Visual annotation - Draws color-coded bounding boxes with smart label placement
- ⚡ Configurable - Easy configuration for thresholds, angles, and text filtering
.
├── README.md # This file
├── requirements.txt # Python package dependencies
├── thrift_assist/ # Source code for ThriftAssist
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── config.py # Configuration handling
│ ├── detector.py # Core detection logic
│ ├── drawer.py # Visual annotation logic
│ └── ocr.py # OCR processing logic
└── tests/ # Unit tests for ThriftAssist
├── __init__.py
├── test_detector.py
├── test_drawer.py
└── test_ocr.py
-
Clone the repository:
git clone https://github.com/yourusername/thrift_assist.git cd thrift_assist -
Install the required Python packages:
pip install -r requirements.txt
-
Set up your Google Cloud Vision API credentials:
-
Follow the Google Cloud Vision API Quickstart to create a project and obtain credentials.
-
Set the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable to the path of your service account key file:export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
-
Run the command-line interface to start detecting phrases in images:
python -m thrift_assist.cli --helpContributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Make your changes and commit them.
- Push your branch and create a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.