Named entity recognition for Telugu

dc.contributor.author Srikanth, P.
dc.contributor.author Murthy, Kavi Narayana
dc.date.accessioned 2022-03-27T05:58:25Z
dc.date.available 2022-03-27T05:58:25Z
dc.date.issued 2008-01-01
dc.description.abstract This paper is about Named Entity Recognition (NER) for Telugu. Not much work has been done in NER for Indian languages in general and Telugu in particular. Adequate annotated corpora are not yet available in Telugu. We recognize that named entities are usually nouns. In this paper we therefore start with our experiments in building a CRF (Conditional Random Fields) based Noun Tagger. Trained on a manually tagged data of 13,425 words and tested on a test data set of 6,223 words, this Noun Tagger has given an F-Measure of about 92%. We then develop a rule based NER system for Telugu. Our focus is mainly on identifying person, place and organization names. A manually checked Named Entity tagged corpus of 72,157 words has been developed using this rule based tagger through bootstrapping. We have then developed a CRF based NER system for Telugu and tested it on several data sets from the Eenaadu and Andhra Prabha newspaper corpora developed by us here. Good performance has been obtained using the majority tag concept. We have obtained overall F-measures between 80% and 97% in various experiments.
dc.identifier.citation IJCNLP 2008 Workshop on NER for South and South East Asian Languages, Proceedings of the Workshop
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8975
dc.subject CRF
dc.subject Majority Tag
dc.subject NER for Telugu
dc.subject Noun Tagger
dc.title Named entity recognition for Telugu
dc.type Conference Proceeding. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: