Multi-font telugu text recognition using hidden Markov models and Akshara Bi-grams
Multi-font telugu text recognition using hidden Markov models and Akshara Bi-grams
No Thumbnail Available
Date
2016-01-01
Authors
Devarapalli, Koteswara Rao
Negi, Atul
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Recent advances in the information technology made possible to introduce many Unicode Telugu fonts for the documentation needs of present society. But the recognition of documents printed in a variety of fonts poses new challenges in building Telugu OCR systems. In this paper, we demonstrate multi-font Telugu printed word recognition using implicit segmentation approach that provides segmentation as a by-product of recognition. Our word recognition approach relies on Hidden Markov Models and akshara bi-gram language model to recognize word images in terms of aksharas (characters). The training set of word images is prepared from document images of popular books and the synthetic document images generated using 8 different Unicode fonts. The testing involves matching the feature vector sequence against sequence of akshara HMMs based on bi-grams. The CER and WER of this system are 21% and 37% respectively. The performance of our system is very encouraging.
Description
Keywords
Akshara,
Bi-gram,
DCT,
HMM,
Telugu OCR,
Word recognition
Citation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). v.10481 LNCS