Kaggle Kuzushiji Recognition: Code for the 8th place solution.
The kuzushiji recognition pipeline is consists of two models: CenterNet character detection model and MobileNetV3 per-character classification model.
Python version:
- 3.7.3
- chainer (6.2.0)
- chainercv (0.13.1)
- cupy-cuda92 (6.2.0)
- albumentations (0.3.1)
- opencv-python (
- Pillow (6.1.0)
- pandas (0.25.0)
- numpy (1.17.0)
- matplotlib (3.1.1)
- japanize-matplotlib (1.0.4)
For unittest:
- pytest (4.4.1)
Please download the competition dataset from here and unzip to <repo root>/data/kuzushiji-recognition
The expected directory structure is as follows:
kuzushiji-recognition/ data/ kuzushiji-recognition/ train.csv train_images test_images unicode_translation.csv sample_submission.csv
Please follow the steps below to train kuzushiji recognition models.
- Set environment variable:
cd <path to this repo> export PYTHONPATH=`pwd`
- Split all annotated samples written in
into train and validation split:
python scripts/prepare_train_val_split.py
- Prepare per-character cropped image set for character classifier training:
python scripts/prepare_char_crop_dataset.py
- Train character detection model:
python scripts/train_detector.py --gpu 0 --out ./results/detector --full-data
- Train character classification model:
python scripts/train_classifier.py --gpu 0 --out ./results/classifier --full-data
- Prepare pseudo label using trained detector and classifier:
python scripts/prepare_pseudo_labels.py --gpu 0 \ ./results/detector/model_700.npz \ ./results/classifier/model_900.npz \ --out data/kuzushiji-recognition-pseudo
- Finetune classifier using pseudo label and original training data:
python scripts/finetune_classifier.py --gpu 0 \ --pseudo-labels-dir data/kuzushiji-recognition-pseudo \ --out ./results/classifier-finetune \ ./results/classifier/model_900.npz
To generate a CSV for submission, please execute the following commands.:
python scripts/prepare_submission.py --gpu 0 \ ./results/detector/model_700.npz \ ./results/classifier-finetune/model_100.npz
The detector class and the classifier class provide easy-to-use inferface for inference. This is an example of inference code. Note that the bounding box format is (xmin, ymin, xmax, ymax)
import chainer
from PIL import Image
from kr.detector.centernet.resnet import Res18UnetCenterNet
from kr.classifier.softmax.mobilenetv3 import MobileNetV3
from kr.datasets import KuzushijiUnicodeMapping
# unicode <-> unicode index mapping
mapping = KuzushijiUnicodeMapping()
# load trained detector
detector = Res18UnetCenterNet()
chainer.serializers.load_npz('./results/detector/model_700.npz', detector)
# load trained classifier
classifier = MobileNetV3(out_ch=len(mapping))
chainer.serializers.load_npz('./results/classifier/model_900.npz', classifier)
# load image
image = Image.open('path/to/image.jpg')
# character detection
bboxes, bbox_scores = detector.detect(image)
# character classification
unicode_indices, scores = classifier.classify(image, bboxes)
unicodes = [mapping.index_to_unicode(idx) for idx in unicode_indices]
Released under the MIT license.