Document processing methods for Telugu and other South East Asian scripts

dc.contributor.author Negi, Atul
dc.contributor.author Sowri, V. S.R.
dc.contributor.author Rao, K. Mohan
dc.date.accessioned 2022-03-27T05:53:53Z
dc.date.available 2022-03-27T05:53:53Z
dc.date.issued 2004-12-01
dc.description.abstract It is observed that in several South East Asian scripts, a single character consists of two or more connected components. In these scripts the the complex arrangment of connected components leads to problems such as touching characters and difficulty in identifying words and text line boundaries. In the present work we propose a method to extract text lines by clustering of connected components, based upon their spatial properties. Those components, with abnormal properties and which are not identified by an OCR, are sent for character segmentation. For character segmentation we describe "Drop Fall" and "Whitestream" methods. The methods presented here are applicable to any language (script) that requires connected component based processing. © 2004 IEEE.
dc.identifier.citation IEEE Region 10 Annual International Conference, Proceedings/TENCON. v.B
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8668
dc.title Document processing methods for Telugu and other South East Asian scripts
dc.type Conference Proceeding. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: