Clustering High-Dimensional Data: A Reduction-Level Fusion of PCA and Random Projection
Clustering High-Dimensional Data: A Reduction-Level Fusion of PCA and Random Projection
No Thumbnail Available
Date
2019-01-01
Authors
Pasunuri, Raghunadh
Venkaiah, Vadlamudi China
Srivastava, Amit
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Principal Component Analysis (PCA) is a very famous statistical tool for representing the data within lower dimension embedding. K-means is a prototype (centroid)-based clustering technique used in unsupervised learning tasks. Random Projection (RP) is another widely used technique for reducing the dimensionality. RP uses projection matrix to project the data into a feature space. Here, we prove the effectiveness of these methods by combining them for efficiently clustering the low as well as high-dimensional data. Our proposed algorithms works by combining Principal Component Analysis (PCA) with Random Projection (RP) to project the data into feature space, then performs K-means clustering on that reduced space (feature space). We compare the proposed algorithm’s performance with simple K-means and PCA-K-means algorithms on 12 benchmark datasets. Of these, 4 are low-dimensional and 8 are high-dimensional datasets. Our proposed algorithms outperform the other methods.
Description
Keywords
Clustering,
High-dimensional data,
K-means,
Principal component analysis,
Random projection
Citation
Advances in Intelligent Systems and Computing. v.740