An OCR system for Telugu

dc.contributor.author Negi, Atul
dc.contributor.author Bhagvati, Chakravarthy
dc.contributor.author Krishna, B.
dc.date.accessioned 2022-03-27T05:53:59Z
dc.date.available 2022-03-27T05:53:59Z
dc.date.issued 2001-01-01
dc.description.abstract Telugu is the language spoken by more than 100 million people of South India. Telugu has a complex orthography with a large number of distinct character shapes (estimated to be of the order of 10,000) composed of simple and compound characters formed from 16 vowels (called achchus) and 36 consonants (called hallus). Here we present an efficient and practical approach to Telugu OCR which limits the number of templates to be recognized to just 370, avoiding issues of classifier design for thousands of shapes or very complex glyph segmentation. A compositional approach using connected components and fringe distance template matching was tested to give a raw OCR accuracy of about 92%. Several experiments across varying fonts and resolutions showed the approach to be satisfactory.
dc.identifier.citation Proceedings of the International Conference on Document Analysis and Recognition, ICDAR. v.2001-January
dc.identifier.issn 15205363
dc.identifier.uri 10.1109/ICDAR.2001.953958
dc.identifier.uri http://ieeexplore.ieee.org/document/953958/
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8676
dc.title An OCR system for Telugu
dc.type Conference Proceeding. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: