Towards improving the accuracy of Telugu OCR systems

dc.contributor.author Kumar, P. Pavan
dc.contributor.author Bhagvati, Chakravarthy
dc.contributor.author Negi, Atul
dc.contributor.author Agarwal, Arun
dc.contributor.author Deekshatulu, B. L.
dc.date.accessioned 2022-03-27T05:54:55Z
dc.date.available 2022-03-27T05:54:55Z
dc.date.issued 2011-12-02
dc.description.abstract Design of a high accuracy OCR system is a challenging task as the system performance is affected by its component modules. Each module has its own impact on the overall accuracy of the OCR system. An improvement in a module reflects upon overall system performance. In the present work, we have developed an OCR system for Telugu. Our experiments on a corpus of about 1000 images has shown that the system performance is degraded due to broken characters caused by the binarization module as well as due to improper character segmentation. Therefore, we address the issues of handling broken characters and poor segmentation. A novel approach which is based on feedback from the distance measure used by the classifier is proposed to handle broken characters. For character segmentation, our proposed approach exploits the orthographic properties of Telugu script. As a result, significant improvement is obtained in the performance of the system. These algorithms are generic and may be applicable to other Indian scripts, especially to south Indian scripts. In our experiments, an end-to-end system performance is evaluated which is not reported in the literature. © 2011 IEEE.
dc.identifier.citation Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
dc.identifier.issn 15205363
dc.identifier.uri 10.1109/ICDAR.2011.185
dc.identifier.uri http://ieeexplore.ieee.org/document/6065443/
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8751
dc.subject Indian scripts
dc.subject OCR system
dc.subject system performance
dc.subject Telugu script
dc.title Towards improving the accuracy of Telugu OCR systems
dc.type Conference Proceeding. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: