Multi-font telugu text recognition using hidden Markov models and Akshara Bi-grams

Devarapalli, Koteswara Rao; Negi, Atul

Multi-font telugu text recognition using hidden Markov models and Akshara Bi-grams

Date

2016-01-01

Authors

Devarapalli, Koteswara Rao

Negi, Atul

Abstract

Recent advances in the information technology made possible to introduce many Unicode Telugu fonts for the documentation needs of present society. But the recognition of documents printed in a variety of fonts poses new challenges in building Telugu OCR systems. In this paper, we demonstrate multi-font Telugu printed word recognition using implicit segmentation approach that provides segmentation as a by-product of recognition. Our word recognition approach relies on Hidden Markov Models and akshara bi-gram language model to recognize word images in terms of aksharas (characters). The training set of word images is prepared from document images of popular books and the synthetic document images generated using 8 different Unicode fonts. The testing involves matching the feature vector sequence against sequence of akshara HMMs based on bi-grams. The CER and WER of this system are 21% and 37% respectively. The performance of our system is very encouraging.

Keywords

Akshara, Bi-gram, DCT, HMM, Telugu OCR, Word recognition

Citation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). v.10481 LNCS

URI

10.1007/978-3-319-68124-5_21
http://link.springer.com/10.1007/978-3-319-68124-5_21
https://dspace.uohyd.ac.in/handle/1/8580

Collections

Computer and Information Sciences - Publications

Full item page