An OCR System for Baybayin Scripts using SVM

Latest

Latest

rbp0803 released this 28 Oct 05:40

· 4 commits to main since this release

e04c28e

In this paper, we intend to discriminate the Baybayin script, a pre-colonial writing system used in the Philippines, from the Latin script at a character level. The proposed algorithm uses four main Support Vector Machine (SVM) classifiers to perform the following classifications between: Baybayin and Latin script, Baybayin characters, Latin characters, and Baybayin diacritical marks. This method emphasizes the recognition of Baybayin characters and so we tested the algorithm with the set of images found in (1) that satisfies our system assumptions. We also include here the codes on how we generate the aforementioned classifiers using the dataset found in (2), (3), and (4) for Baybayin, Latin, and Baybayin diacritic characters, respectively. Finally, we discuss the strengths and limitations of the system, its experimental results and recommendations for further research.
URL links for dataset:
(1) https://www.kaggle.com/jamesnogra/baybayn-baybayin-handwritten-images
(2) https://www.kaggle.com/rodneypino/baybayin-and-latin-binary-images-in-mat-format?select=Baybayin
(3) https://www.kaggle.com/rodneypino/baybayin-and-latin-binary-images-in-mat-format?select=Latin
(4) https://www.kaggle.com/rodneypino/baybayin-and-latin-binary-images-in-mat-format?select=Baybayin+Diacritics

You can check the full paper here: https://peerj.com/articles/cs-360/.

Assets 3