UHTelPCC: A Dataset for Telugu Printed Character Recognition

dc.contributor.author Kummari, Rakesh
dc.contributor.author Bhagvati, Chakravarthy
dc.date.accessioned 2022-03-27T05:54:18Z
dc.date.available 2022-03-27T05:54:18Z
dc.date.issued 2019-01-01
dc.description.abstract This paper describes how UHTelPCC, a dataset for Telugu printed character recognition, is created and its characteristics. The dataset is created from characters extracted from images of printed Telugu texts from the period 1950–1990. Thus, it is hoped that the dataset provides the basis for developing practical Telugu OCR systems. UHTelPCC is to provide a standard benchmark for comparing different algorithms for Telugu OCR and helps in research and development of Telugu OCR systems. UHTelPCC contains 70K samples of 325 classes, and these samples are divided into 50K, 10K, 10K training, validation, and test sets respectively. It is hoped that UHTelPCC serves like MNIST, a dataset for handwritten digit recognition, for Telugu printed character recognition. The baseline performances on the test set using KNN, MLP, and CNN are 98.85%, 99.52%, and 99.68% respectively. UHTelPCC is available at http://scis.uohyd.ac.in/~chakcs/UHTelPCC.html.
dc.identifier.citation Communications in Computer and Information Science. v.1037
dc.identifier.issn 18650929
dc.identifier.uri 10.1007/978-981-13-9187-3_3
dc.identifier.uri http://link.springer.com/10.1007/978-981-13-9187-3_3
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8703
dc.subject OCR
dc.subject OCR dataset
dc.subject Optical Character Recognition
dc.subject Printed Telugu OCR
dc.subject Telugu character dataset
dc.subject Telugu dataset
dc.subject UHTelPCC
dc.title UHTelPCC: A Dataset for Telugu Printed Character Recognition
dc.type Book Series. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: