Orthographic Properties Based Telugu Text Recognition Using Hidden Markov Models

Rao, Devarapalli Koteswara; Negi, Atul

Orthographic Properties Based Telugu Text Recognition Using Hidden Markov Models

dc.contributor.author	Rao, Devarapalli Koteswara
dc.contributor.author	Negi, Atul
dc.date.accessioned	2022-03-27T05:52:47Z
dc.date.available	2022-03-27T05:52:47Z
dc.date.issued	2018-01-25
dc.description.abstract	Telugu script has the glyphs for vowels, consonants and modifiers to form orthographic units called aksharas (characters). In this paper, we present the Telugu printed text recognition based on orthographic properties of the script and Hidden Markov Models (HMMs). One of the orthographic properties is that the consonant modifiers always appear spatially in the middle and lower zones. The concept of peak fringe numbers (PFNs) is used to define a first level classifier. The purpose of the first level classification is to classify the Telugu word images based on the data in lower zone for better modeling. Since conventional Telugu OCR applications facing segmentation difficulties at various levels of segmentation such as akshara and connected-component. We use HMMs for modeling Telugu akshara shapes and also apply bi-grams of aksharas at the recognition stage. Our approach aims to overcome the segmentation problems by attempting a segmentation-free method for Telugu printed text recognition. With the suitability of HMMs for modeling data with variations, our data set also includes document images with different DPIs such as 200, 250 and 300. We measure character error rate (CER) to observe the system performance. The recognition capability of the system is encouraging and the CER of our system is 15 percent.
dc.identifier.citation	Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. v.5
dc.identifier.issn	15205363
dc.identifier.uri	10.1109/ICDAR.2017.327
dc.identifier.uri	http://ieeexplore.ieee.org/document/8270273/
dc.identifier.uri	https://dspace.uohyd.ac.in/handle/1/8564
dc.subject	Akshara recognition
dc.subject	HMM
dc.subject	Telugu OCR
dc.subject	Word recognition
dc.title	Orthographic Properties Based Telugu Text Recognition Using Hidden Markov Models
dc.type	Conference Proceeding. Conference Paper
dspace.entity.type

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer and Information Sciences - Publications