Word Representations for Gender Classification Using Deep Learning

Ritesh, Ritesh; Bhagvati, Chakravarthy

Word Representations for Gender Classification Using Deep Learning

Date

2018-01-01

Authors

Ritesh, Ritesh

Bhagvati, Chakravarthy

Abstract

This paper studies the effect of word representations on gender classification using deep learning. There are two main objectives: how well do popular deep learning architectures, namely LSTM and CNNs, perform on gender classification task and investigate how the choice of word representation effects the performance. Three networks, LSTM, CNN and LeNet-5, were trained on a dataset containing about 18000 names from India, Western countries, Sri Lanka and Japan. These names, encoded using the popular One-Hot representation and Word Embeddings in addition to Integer representation and an Enhanced Integer representation (proposed in this paper), were given as Input and the performance is evaluated on accuracy, training times and size of input layer. Experimental results show that LSTM in combination with word embedding derived from the proposed Enhanced Integer representation gives the best performance of about 85%. One-Hot representation is superior to Integer and Enhanced Integer representation but appears to perform lower than word embeddings.

Keywords

CNN, Gender prediction, LSTM, One-Hot, Word embeddings, Word representations

Citation

Procedia Computer Science. v.132

URI

10.1016/j.procs.2018.05.015
https://www.sciencedirect.com/science/article/abs/pii/S1877050918307476
https://dspace.uohyd.ac.in/handle/1/8709

Collections

Computer and Information Sciences - Publications

Full item page