Authorship Attribution using Filtered N-grams as Features

dc.contributor.author Singh, Manan
dc.contributor.author Murthy, Kavi Narayana
dc.date.accessioned 2022-03-27T05:58:21Z
dc.date.available 2022-03-27T05:58:21Z
dc.date.issued 2021-01-01
dc.description.abstract Authorship attribution is the problem of assigning an author to a document of unknown authorship, given a candidate set of authors and their sample documents. As a text classification task, this requires features that can capture the writing styles of authors. In this work, we compare the filtered n-grams with the traditional or unfiltered n-grams as features for authorship attribution. Filtered n-grams are the n-grams formed after filtering out from the text certain kinds of tokens. We explore the filtered n-grams formed after the removal of noun groups and verb groups. We hypothesize that the remaining text should still be enough to capture the writing style. Moreover, this removal makes possible the construction of new n-grams which would have been missed otherwise. In our experiments, we find that filtered n-grams improve the performance. In the feature ablation study, we confirm that this improvement is due to the new n-grams which are possible only after filtering.
dc.identifier.citation Lecture Notes on Data Engineering and Communications Technologies. v.63
dc.identifier.issn 23674512
dc.identifier.uri 10.1007/978-981-16-0081-4_38
dc.identifier.uri https://link.springer.com/10.1007/978-981-16-0081-4_38
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8971
dc.subject authorship attribution
dc.subject filtered n-grams
dc.subject N-grams
dc.subject text classification features
dc.subject writing style
dc.title Authorship Attribution using Filtered N-grams as Features
dc.type Book Series. Book Chapter
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: