Localization and extraction of text in Telugu document images

Negi, Atul; Kasinadhuni, Nikhil

Localization and extraction of text in Telugu document images

dc.contributor.author	Negi, Atul
dc.contributor.author	Kasinadhuni, Nikhil
dc.date.accessioned	2022-03-27T05:53:56Z
dc.date.available	2022-03-27T05:53:56Z
dc.date.issued	2003-12-01
dc.description.abstract	Segmentation of document images into text and non-text regions is an important step in the processing of document images, so that optical character recognition may be performed on the textual portions. Although in the literature this problem is approached in a script independent manner, here we present a system to locate and extract regions of Telugu text based on the circular nature of the script. The process is started by obtaining the Sobel gradient magnitude of the gray level image. Then, the Hough Transform for circles is performed to locate the circular features of Telugu text. A region growing process on the located circles yields text regions with connected blocks of text. This is followed by Recursive XY Cuts to segment the regions into paragraphs, lines and word regions. A region merging process with a bottom-up approach is then taken up to envelope individual words. Local binarization of the word MBRs yields connected components containing glyphs for recognition.The segmentation process succeeds in extracting text from images with complex Non-Manhattan layouts which is commonly found in various Telugu magazines.
dc.identifier.citation	IEEE Region 10 Annual International Conference, Proceedings/TENCON. v.3
dc.identifier.uri	https://dspace.uohyd.ac.in/handle/1/8672
dc.title	Localization and extraction of text in Telugu document images
dc.type	Conference Proceeding. Conference Paper
dspace.entity.type

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer and Information Sciences - Publications