This project demonstrates a pipeline for processing images using OCR (Optical Character Recognition) and leveraging generative AI for data interpretation.
-
Phase 1: OCR (Optical Character Recognition)
- Converts images to text using the pytesseract library.
- Saves the extracted text to an output file.
-
Phase 2: Generative AI Integration
- Uses OpenAI's API to analyze the OCR output.
- Employs a language model (e.g., gpt-4o-mini, gemini-flash) to interpret the extracted text.
- Provides insights and analysis based on the content of the image.
-
Phase 3: Thematic Data Grouping (Not implemented yet)
- (Future development: Organizing extracted data into relevant themes or categories.)
pytesseract
Pillow
(PIL)openai
google-generativeai
- Install the necessary dependencies.
- Set up your OpenAI API key using
userdata.get('OPENAI_API_KEY')
. - Place the images you want to process in the same directory as the notebook.
- Run the notebook to execute the OCR and AI processing.
- This project uses a lightweight language model to manage costs.
- Larger bodies of text may require more processing power and potentially higher API costs.
- Thematic data grouping functionality is planned for future development.
Contributions are welcome! Feel free to submit issues or pull requests. """