Authorship Attribution using Filtered N-grams as Features
Authorship Attribution using Filtered N-grams as Features
No Thumbnail Available
Date
2021-01-01
Authors
Singh, Manan
Murthy, Kavi Narayana
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Authorship attribution is the problem of assigning an author to a document of unknown authorship, given a candidate set of authors and their sample documents. As a text classification task, this requires features that can capture the writing styles of authors. In this work, we compare the filtered n-grams with the traditional or unfiltered n-grams as features for authorship attribution. Filtered n-grams are the n-grams formed after filtering out from the text certain kinds of tokens. We explore the filtered n-grams formed after the removal of noun groups and verb groups. We hypothesize that the remaining text should still be enough to capture the writing style. Moreover, this removal makes possible the construction of new n-grams which would have been missed otherwise. In our experiments, we find that filtered n-grams improve the performance. In the feature ablation study, we confirm that this improvement is due to the new n-grams which are possible only after filtering.
Description
Keywords
authorship attribution,
filtered n-grams,
N-grams,
text classification features,
writing style
Citation
Lecture Notes on Data Engineering and Communications Technologies. v.63