Fringe map based text line segmentation of printed Telugu document images
Fringe map based text line segmentation of printed Telugu document images
No Thumbnail Available
Date
2011-12-02
Authors
Koppula, Vijaya Kumar
Negi, Atul
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Text line segmentation is a crucial and important step which can greatly influence the accuracy of an OCR system. One of the major obstacles to building high-accuracy OCR systems for Indic scripts has been the text line segmentation problem. In particular for Telugu script this problem is still to be adequately addressed by research. The common methods of Roman script are not applicable due to the inherent script complexity of Telugu. Previous approaches to Telugu OCR in the literature take a simplified view of the problem, leading to errors in line segmentation. The problem is compounded in old documents that are typeset manually and have non-uniform print quality. In this work we propose a new method using the fringe map concept. In a fringe map each pixel of the binary image is associated with a fringe number that denotes the distance to the nearest black pixel. We use fringe value information to segment text lines. First we locate peak fringe numbers (PFNs). PFNs that are not between lines are filtered out. PFNs between adjacent lines are used to construct a region. The segmenting path between the adjacent lines is found by joining the filtered PFNs of a region. © 2011 IEEE.
Description
Keywords
Fringe Maps,
Indic scripts,
Telugu OCR,
Text line segmentation
Citation
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR