Orthographic Properties Based Telugu Text Recognition Using Hidden Markov Models

Rao, Devarapalli Koteswara; Negi, Atul

Orthographic Properties Based Telugu Text Recognition Using Hidden Markov Models

Date

2018-01-25

Authors

Rao, Devarapalli Koteswara

Negi, Atul

Abstract

Telugu script has the glyphs for vowels, consonants and modifiers to form orthographic units called aksharas (characters). In this paper, we present the Telugu printed text recognition based on orthographic properties of the script and Hidden Markov Models (HMMs). One of the orthographic properties is that the consonant modifiers always appear spatially in the middle and lower zones. The concept of peak fringe numbers (PFNs) is used to define a first level classifier. The purpose of the first level classification is to classify the Telugu word images based on the data in lower zone for better modeling. Since conventional Telugu OCR applications facing segmentation difficulties at various levels of segmentation such as akshara and connected-component. We use HMMs for modeling Telugu akshara shapes and also apply bi-grams of aksharas at the recognition stage. Our approach aims to overcome the segmentation problems by attempting a segmentation-free method for Telugu printed text recognition. With the suitability of HMMs for modeling data with variations, our data set also includes document images with different DPIs such as 200, 250 and 300. We measure character error rate (CER) to observe the system performance. The recognition capability of the system is encouraging and the CER of our system is 15 percent.

Keywords

Akshara recognition, HMM, Telugu OCR, Word recognition

Citation

Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. v.5

URI

10.1109/ICDAR.2017.327
http://ieeexplore.ieee.org/document/8270273/
https://dspace.uohyd.ac.in/handle/1/8564

Collections

Computer and Information Sciences - Publications

Full item page