Word Representations for Gender Classification Using Deep Learning

No Thumbnail Available
Date
2018-01-01
Authors
Ritesh, Ritesh
Bhagvati, Chakravarthy
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This paper studies the effect of word representations on gender classification using deep learning. There are two main objectives: how well do popular deep learning architectures, namely LSTM and CNNs, perform on gender classification task and investigate how the choice of word representation effects the performance. Three networks, LSTM, CNN and LeNet-5, were trained on a dataset containing about 18000 names from India, Western countries, Sri Lanka and Japan. These names, encoded using the popular One-Hot representation and Word Embeddings in addition to Integer representation and an Enhanced Integer representation (proposed in this paper), were given as Input and the performance is evaluated on accuracy, training times and size of input layer. Experimental results show that LSTM in combination with word embedding derived from the proposed Enhanced Integer representation gives the best performance of about 85%. One-Hot representation is superior to Integer and Enhanced Integer representation but appears to perform lower than word embeddings.
Description
Keywords
CNN, Gender prediction, LSTM, One-Hot, Word embeddings, Word representations
Citation
Procedia Computer Science. v.132