Localization and extraction of text in Telugu document images
Localization and extraction of text in Telugu document images
| dc.contributor.author | Negi, Atul | |
| dc.contributor.author | Kasinadhuni, Nikhil | |
| dc.date.accessioned | 2022-03-27T05:53:56Z | |
| dc.date.available | 2022-03-27T05:53:56Z | |
| dc.date.issued | 2003-12-01 | |
| dc.description.abstract | Segmentation of document images into text and non-text regions is an important step in the processing of document images, so that optical character recognition may be performed on the textual portions. Although in the literature this problem is approached in a script independent manner, here we present a system to locate and extract regions of Telugu text based on the circular nature of the script. The process is started by obtaining the Sobel gradient magnitude of the gray level image. Then, the Hough Transform for circles is performed to locate the circular features of Telugu text. A region growing process on the located circles yields text regions with connected blocks of text. This is followed by Recursive XY Cuts to segment the regions into paragraphs, lines and word regions. A region merging process with a bottom-up approach is then taken up to envelope individual words. Local binarization of the word MBRs yields connected components containing glyphs for recognition.The segmentation process succeeds in extracting text from images with complex Non-Manhattan layouts which is commonly found in various Telugu magazines. | |
| dc.identifier.citation | IEEE Region 10 Annual International Conference, Proceedings/TENCON. v.3 | |
| dc.identifier.uri | https://dspace.uohyd.ac.in/handle/1/8672 | |
| dc.title | Localization and extraction of text in Telugu document images | |
| dc.type | Conference Proceeding. Conference Paper | |
| dspace.entity.type |
Files
License bundle
1 - 1 of 1