Check the CHANGELOG file to have a global overview of the latest modifications! 😋
├── architectures : utilities for model architectures
│ ├── layers : custom layer implementations
│ ├── transformers : transformer architecture implementations
│ ├── common_blocks.py : defines common blocks (e.g., Conv + BN + ReLU)
│ ├── crnn_arch.py : CRNN architecture
│ ├── east_arch.py : EAST architecture
│ ├── generation_utils.py : utilities for text and sequence generation
│ ├── hparams.py : hyperparameter management
│ ├── simple_models.py : defines classical models such as CNN / RNN / MLP and siamese
│ └── yolo_arch.py : YOLOv2 architecture
├── custom_train_objects : custom objects used in training / testing
├── loggers : logging utilities for tracking experiment progress
├── models : main directory for model classes
│ ├── detection : detector implementations
│ │ ├── base_detector.py : abstract base class for all detectors
│ │ ├── east.py : EAST implementation for text detection
│ │ └── yolo.py : YOLOv2 implementation for general object detection
│ ├── interfaces : directories for interface classes
│ ├── ocr : OCR implementations
│ │ ├── base_ocr.py : abstract base class for all OCR models
│ │ └── crnn.py : CRNN implementation for OCR
│ └── weights_converter.py : utilities to convert weights between different models
├── tests : unit and integration tests for model validation
├── utils : utility functions for data processing and visualization
├── LICENCE : project license file
├── ocr.ipynb : notebook demonstrating model creation + OCR features
├── README.md : this file
└── requirements.txt : required packages
Check the main project for more information about the unextended modules / structure / main classes.
Check the detection project for more information about the detection
module and the EAST Scene-Text Detection model.
- OCR (module
models.ocr
) :
Feature | Function / class | Description |
---|---|---|
OCR | ocr |
Performs OCR on the given image(s) |
You can check the ocr
notebook for a concrete demonstration.
Available architectures :
Classes | Dataset | Architecture | Trainer | Weights |
---|
Models must be unzipped in the pretrained_models/
directory!
The pretrained CRNN
models come from the EasyOCR library. Weights are automatically downloaded given the language or the model name, and converted to keras
! The easyocr
library is therefore not required, while pytorch
is required for weights loading (for conversion).
The pretrained version of EAST can be downloaded from this project. It should be placed in pretrained_models/pretrained_weights/east_vgg16.pth
(torch
is required to convert the weights: pip install torch
).
See the installation guide for a step-by-step installation 😄
Here is a summary of the installation procedure, if you have a working python environment :
- Clone this repository:
git clone https://github.com/yui-mhcp/ocr.git
- Go to the root of this repository:
cd ocr
- Install requirements:
pip install -r requirements.txt
- Open the
ocr
notebook and follow the instructions!
- Make the TO-DO list
- Convert the
CRNN
architecture / weights from theeasyocr
library totensorflow
- Convert the
CRNN + attention
architecture from this repo totensorflow
- Add examples to initialize pretrained models (both EAST and CRNN)
- Add an example to perform OCR on image (with text detection)
- Add an example to perform OCR on camera
- Allow to combine texts in lines / paragraphs (as EAST detects individual words)
- Take into account the text rotation in the combination procedure
The code for the CRNN architecture is highly inspired from the easyocr
repo:
- EasyOCR library: official repo of the
easyocr
library
The code for the EAST part of this project is highly inspired from this repo:
-
SakuraRiven pytorch implementation: pytorch implementation of the EAST paper.
-
Awesome-OCR : A curated list of OCR resources
-
Tesseract OCR : The official Tesseract repository
-
Deep Text Recognition Benchmark : A comprehensive benchmark of Scene Text Recognition models
-
CRAFT-pytorch : Character Region Awareness for Text Detection
-
mmocr : OpenMMLab Text Detection, Recognition and Understanding Toolbox
- An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition : the original CRNN paper
- What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis : a great benchmark of OCR models + an open-source repository with pretrained models and datasets
- U-Net: Convolutional Networks for Biomedical Image Segmentation : U-net original paper
- EAST: An Efficient and Accurate Scene Text Detector : text detection (with possibly rotated bounding-boxes) with a segmentation model (U-Net).
- COCO Text: an extension of COCO for text detection
- ICDAR 2015: a standard dataset for text detection and recognition
- Synthetic Word Dataset: synthetic word dataset for OCR training
- A Comprehensive Guide to OCR with Tesseract, OpenCV and Python : A great introduction to classical OCR approaches
- Scene Text Detection with OpenCV : Tutorial on implementing EAST text detector
- Attention Mechanisms in OCR : How attention mechanisms improve OCR accuracy
Contacts:
- Mail:
[email protected]
- Discord: yui0732
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.
This license allows you to use, modify, and distribute the code, as long as you include the original copyright and license notice in any copy of the software/source. Additionally, if you modify the code and distribute it, or run it on a server as a service, you must make your modified version available under the same license.
For more information about the AGPL-3.0 license, please visit the official website
If you find this project useful in your work, please add this citation to give it more visibility! 😋
@misc{yui-mhcp
author = {yui},
title = {A Deep Learning projects centralization},
year = {2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/yui-mhcp}}
}