Multi-font telugu text recognition using hidden Markov models and Akshara Bi-grams

dc.contributor.author Devarapalli, Koteswara Rao
dc.contributor.author Negi, Atul
dc.date.accessioned 2022-03-27T05:52:56Z
dc.date.available 2022-03-27T05:52:56Z
dc.date.issued 2016-01-01
dc.description.abstract Recent advances in the information technology made possible to introduce many Unicode Telugu fonts for the documentation needs of present society. But the recognition of documents printed in a variety of fonts poses new challenges in building Telugu OCR systems. In this paper, we demonstrate multi-font Telugu printed word recognition using implicit segmentation approach that provides segmentation as a by-product of recognition. Our word recognition approach relies on Hidden Markov Models and akshara bi-gram language model to recognize word images in terms of aksharas (characters). The training set of word images is prepared from document images of popular books and the synthetic document images generated using 8 different Unicode fonts. The testing involves matching the feature vector sequence against sequence of akshara HMMs based on bi-grams. The CER and WER of this system are 21% and 37% respectively. The performance of our system is very encouraging.
dc.identifier.citation Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). v.10481 LNCS
dc.identifier.issn 03029743
dc.identifier.uri 10.1007/978-3-319-68124-5_21
dc.identifier.uri http://link.springer.com/10.1007/978-3-319-68124-5_21
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8580
dc.subject Akshara
dc.subject Bi-gram
dc.subject DCT
dc.subject HMM
dc.subject Telugu OCR
dc.subject Word recognition
dc.title Multi-font telugu text recognition using hidden Markov models and Akshara Bi-grams
dc.type Book Series. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: