A survey of distance/similarity measures for categorical data
A survey of distance/similarity measures for categorical data
No Thumbnail Available
Date
2014-09-03
Authors
Alamuri, Madhavi
Surampudi, Bapi Raju
Negi, Atul
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Similarity or distance between two objects plays a fundamental role in many data mining tasks like classification and clustering. Categorical data, unlike numeric data, conceptually is deficient of default ordering relations on the attribute values. This makes the task of devising similarity or distance metrics and data mining tasks such as classification and clustering of categorical data more challenging. In this paper we formulate a taxonomy of various distance or similarity measures used in conjunction with data whose attributes are categorical. We categorize the existing measures into two broad classes, namely, Context-free and Context-sensitive measures for categorical data. In addition, we suggest a taxonomy of the clustering approaches for categorical data. We also propose a hybrid approach for measuring similarity between objects. We make a relative comparison of the strengths and weaknesses of some of the similarity measures and point out future research directions.
Description
Keywords
Categorical data,
Clustering,
Similarity,
Supervised,
Unsupervised
Citation
Proceedings of the International Joint Conference on Neural Networks