Text Document Clustering Using Community Discovery Approach

Beniwal, Anu; Roy, Gourav; Durga Bhavani, S.

Text Document Clustering Using Community Discovery Approach

dc.contributor.author	Beniwal, Anu
dc.contributor.author	Roy, Gourav
dc.contributor.author	Durga Bhavani, S.
dc.date.accessioned	2022-03-27T05:55:21Z
dc.date.available	2022-03-27T05:55:21Z
dc.date.issued	2020-01-01
dc.description.abstract	The problem of document clustering is about automatic grouping of text documents into groups containing similar documents. This problem under supervised setting yields good results whereas for unannotated data the unsupervised machine learning approach does not yield good results always. Algorithms like K-Means clustering are most popular when the class labels are not known. The objective of this work is to apply community discovery algorithms from the literature of social network analysis to detect the underlying groups in the text data. We model the corpus of documents as a graph with distinct non-trivial words from the whole corpus considered as nodes and an edge is added between two nodes if the corresponding word nodes occur together in at least one common document. Edge weight between two word nodes is defined as the number of documents in which those two words co-occur together. We apply the fast Louvain community discovery algorithm to detect communities. The challenge is to interpret the communities as classes. If the number of communities obtained is greater than the required number of classes, a technique for merging is proposed. The community which has the maximum number of similar words with a document is assigned as the community for that document. The main thrust of the paper is to show a novel approach to document clustering using community discovery algorithms. The proposed algorithm is evaluated on a few bench mark data sets and we find that our algorithm gives competitive results on majority of the data sets when compared to the standard clustering algorithms.
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). v.11969 LNCS
dc.identifier.issn	03029743
dc.identifier.uri	10.1007/978-3-030-36987-3_22
dc.identifier.uri	http://link.springer.com/10.1007/978-3-030-36987-3_22
dc.identifier.uri	https://dspace.uohyd.ac.in/handle/1/8783
dc.subject	Clustering
dc.subject	Louvain community discovery algorithm
dc.subject	Social networks
dc.title	Text Document Clustering Using Community Discovery Approach
dc.type	Book Series. Conference Paper
dspace.entity.type

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer and Information Sciences - Publications