Named entity recognition for Telugu

Srikanth, P.; Murthy, Kavi Narayana

Named entity recognition for Telugu

dc.contributor.author	Srikanth, P.
dc.contributor.author	Murthy, Kavi Narayana
dc.date.accessioned	2022-03-27T05:58:25Z
dc.date.available	2022-03-27T05:58:25Z
dc.date.issued	2008-01-01
dc.description.abstract	This paper is about Named Entity Recognition (NER) for Telugu. Not much work has been done in NER for Indian languages in general and Telugu in particular. Adequate annotated corpora are not yet available in Telugu. We recognize that named entities are usually nouns. In this paper we therefore start with our experiments in building a CRF (Conditional Random Fields) based Noun Tagger. Trained on a manually tagged data of 13,425 words and tested on a test data set of 6,223 words, this Noun Tagger has given an F-Measure of about 92%. We then develop a rule based NER system for Telugu. Our focus is mainly on identifying person, place and organization names. A manually checked Named Entity tagged corpus of 72,157 words has been developed using this rule based tagger through bootstrapping. We have then developed a CRF based NER system for Telugu and tested it on several data sets from the Eenaadu and Andhra Prabha newspaper corpora developed by us here. Good performance has been obtained using the majority tag concept. We have obtained overall F-measures between 80% and 97% in various experiments.
dc.identifier.citation	IJCNLP 2008 Workshop on NER for South and South East Asian Languages, Proceedings of the Workshop
dc.identifier.uri	https://dspace.uohyd.ac.in/handle/1/8975
dc.subject	CRF
dc.subject	Majority Tag
dc.subject	NER for Telugu
dc.subject	Noun Tagger
dc.title	Named entity recognition for Telugu
dc.type	Conference Proceeding. Conference Paper
dspace.entity.type

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer and Information Sciences - Publications