Kaggle Kuzushiji Recognition: Code for the 8th place solution.
The kuzushiji recognition pipeline is consists of two models: CenterNet character detection model and MobileNetV3 per-character classification model.
Contents
Python version:
- 3.7.3
Libraries:
- chainer (6.2.0)
- chainercv (0.13.1)
- cupy-cuda92 (6.2.0)
- albumentations (0.3.1)
- opencv-python (4.1.0.25)
- Pillow (6.1.0)
- pandas (0.25.0)
- numpy (1.17.0)
- matplotlib (3.1.1)
- japanize-matplotlib (1.0.4)
For unittest:
- pytest (4.4.1)
Please download the competition dataset from here and unzip to <repo root>/data/kuzushiji-recognition
.
The expected directory structure is as follows:
kuzushiji-recognition/ data/ kuzushiji-recognition/ train.csv train_images test_images unicode_translation.csv sample_submission.csv
Please follow the steps below to train kuzushiji recognition models.
- Set environment variable:
cd <path to this repo> export PYTHONPATH=`pwd`
- Split all annotated samples written in
train.csv
into train and validation split:
python scripts/prepare_train_val_split.py
- Prepare per-character cropped image set for character classifier training:
python scripts/prepare_char_crop_dataset.py
- Train character detection model:
python scripts/train_detector.py --gpu 0 --out ./results/detector --full-data
- Train character classification model:
python scripts/train_classifier.py --gpu 0 --out ./results/classifier --full-data
- Prepare pseudo label using trained detector and classifier:
python scripts/prepare_pseudo_labels.py --gpu 0 \ ./results/detector/model_700.npz \ ./results/classifier/model_900.npz \ --out data/kuzushiji-recognition-pseudo
- Finetune classifier using pseudo label and original training data:
python scripts/finetune_classifier.py --gpu 0 \ --pseudo-labels-dir data/kuzushiji-recognition-pseudo \ --out ./results/classifier-finetune \ ./results/classifier/model_900.npz
To generate a CSV for submission, please execute the following commands.:
python scripts/prepare_submission.py --gpu 0 \ ./results/detector/model_700.npz \ ./results/classifier-finetune/model_100.npz
The detector class and the classifier class provide easy-to-use inferface for inference. This is an example of inference code. Note that the bounding box format is (xmin, ymin, xmax, ymax)
.
import chainer
from PIL import Image
from kr.detector.centernet.resnet import Res18UnetCenterNet
from kr.classifier.softmax.mobilenetv3 import MobileNetV3
from kr.datasets import KuzushijiUnicodeMapping
# unicode <-> unicode index mapping
mapping = KuzushijiUnicodeMapping()
# load trained detector
detector = Res18UnetCenterNet()
chainer.serializers.load_npz('./results/detector/model_700.npz', detector)
# load trained classifier
classifier = MobileNetV3(out_ch=len(mapping))
chainer.serializers.load_npz('./results/classifier/model_900.npz', classifier)
# load image
image = Image.open('path/to/image.jpg')
# character detection
bboxes, bbox_scores = detector.detect(image)
# character classification
unicode_indices, scores = classifier.classify(image, bboxes)
unicodes = [mapping.index_to_unicode(idx) for idx in unicode_indices]
Released under the MIT license.