Text Document Clustering Using Community Discovery Approach

dc.contributor.author Beniwal, Anu
dc.contributor.author Roy, Gourav
dc.contributor.author Durga Bhavani, S.
dc.date.accessioned 2022-03-27T05:55:21Z
dc.date.available 2022-03-27T05:55:21Z
dc.date.issued 2020-01-01
dc.description.abstract The problem of document clustering is about automatic grouping of text documents into groups containing similar documents. This problem under supervised setting yields good results whereas for unannotated data the unsupervised machine learning approach does not yield good results always. Algorithms like K-Means clustering are most popular when the class labels are not known. The objective of this work is to apply community discovery algorithms from the literature of social network analysis to detect the underlying groups in the text data. We model the corpus of documents as a graph with distinct non-trivial words from the whole corpus considered as nodes and an edge is added between two nodes if the corresponding word nodes occur together in at least one common document. Edge weight between two word nodes is defined as the number of documents in which those two words co-occur together. We apply the fast Louvain community discovery algorithm to detect communities. The challenge is to interpret the communities as classes. If the number of communities obtained is greater than the required number of classes, a technique for merging is proposed. The community which has the maximum number of similar words with a document is assigned as the community for that document. The main thrust of the paper is to show a novel approach to document clustering using community discovery algorithms. The proposed algorithm is evaluated on a few bench mark data sets and we find that our algorithm gives competitive results on majority of the data sets when compared to the standard clustering algorithms.
dc.identifier.citation Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). v.11969 LNCS
dc.identifier.issn 03029743
dc.identifier.uri 10.1007/978-3-030-36987-3_22
dc.identifier.uri http://link.springer.com/10.1007/978-3-030-36987-3_22
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8783
dc.subject Clustering
dc.subject Louvain community discovery algorithm
dc.subject Social networks
dc.title Text Document Clustering Using Community Discovery Approach
dc.type Book Series. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: