Orthographic Properties Based Telugu Text Recognition Using Hidden Markov Models

No Thumbnail Available
Date
2018-01-25
Authors
Rao, Devarapalli Koteswara
Negi, Atul
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Telugu script has the glyphs for vowels, consonants and modifiers to form orthographic units called aksharas (characters). In this paper, we present the Telugu printed text recognition based on orthographic properties of the script and Hidden Markov Models (HMMs). One of the orthographic properties is that the consonant modifiers always appear spatially in the middle and lower zones. The concept of peak fringe numbers (PFNs) is used to define a first level classifier. The purpose of the first level classification is to classify the Telugu word images based on the data in lower zone for better modeling. Since conventional Telugu OCR applications facing segmentation difficulties at various levels of segmentation such as akshara and connected-component. We use HMMs for modeling Telugu akshara shapes and also apply bi-grams of aksharas at the recognition stage. Our approach aims to overcome the segmentation problems by attempting a segmentation-free method for Telugu printed text recognition. With the suitability of HMMs for modeling data with variations, our data set also includes document images with different DPIs such as 200, 250 and 300. We measure character error rate (CER) to observe the system performance. The recognition capability of the system is encouraging and the CER of our system is 15 percent.
Description
Keywords
Akshara recognition, HMM, Telugu OCR, Word recognition
Citation
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. v.5