Speeding-up the prototype based kernel k-means clustering method for large data sets

Hitendra Sarma, T.; Viswanath, P.; Negi, Atul

Speeding-up the prototype based kernel k-means clustering method for large data sets

Date

2016-10-31

Authors

Hitendra Sarma, T.

Viswanath, P.

Negi, Atul

Abstract

Kernel k-means is seen as a non-linear extension of the k-means clustering method, with good performance in identifying non-isotropic and linearly inseparable clusters. However space and time requirement of kernel k-means is expensive with O(n2) complexity. Present applications with large in-memory computations make this method insuitable for large data sets. Recently, a simple prototype based hybrid approach speedsup kernel k-means method for large data sets [1]. The time complexity of this method is O(n + p2), where p is the number of prototypes. Each prototype is a representative pattern of a group-let of size (threshold) τ. The time complexity of this method not only depends upon p but which in turn depends on clustering threshold. Increasing the threshold value can decrease the number of prototypes p, but, quality of the clustering result might suffer. Hence fixing the appropriate value of the threshold is the major challenge in this approach. This paper, presents a solution to this problem, by allowing τ to vary, depending on the location of the group-let in the space. Intuitively, If the grouplet is close to a cluster center (and away from others) then its size could be large, but if it is lying somewhere between two cluster centers, then its size should be small. It is experimentally shown that this reduces the clustering time and also increases the clustering accuracy. The presented method is a suitable one for large data sets like in data mining.

Keywords

Data mining, Kernel k-means clustering method

Citation

Proceedings of the International Joint Conference on Neural Networks. v.2016-October

URI

10.1109/IJCNN.2016.7727432
http://ieeexplore.ieee.org/document/7727432/
https://dspace.uohyd.ac.in/handle/1/8575

Collections

Computer and Information Sciences - Publications

Full item page