A hybrid approach to classification of categorical data based on information-theoretic context selection
A hybrid approach to classification of categorical data based on information-theoretic context selection
No Thumbnail Available
Date
2015-01-01
Authors
Alamuri, Madhavi
Surampudi, Bapi Raju
Negi, Atul
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Clustering or classification of data described by categorical attributes is a challenging task in data mining. This is because it is difficult to define a measure between pairs of values of a categorical attributes. The difficulty arises due to lack of ordering information between various pairs of categorical attributes. In this paper we introduce a Hybrid Approach which combines set based context selection with distance computation using KL divergence method. In the literature context based approaches have been introduced recently. Current approaches look at categorical attributes individually, however our approach proposes a novel scheme inspired from information theory. We consider the interdependence redundancy measure to select the significant attributes for context selection. The proposed approach gives encouraging results for low dimensional benchmark UCI datasets with k-nearest neighbor classifier based on the proposed measure. On these datasets the proposed measure performed well in comparison to other distance measures while using various classifiers such as SVM, Naive Bayes and C4.5.
Description
Keywords
Categorical data,
Classification,
Context,
Similarity
Citation
Advances in Intelligent Systems and Computing. v.415