Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets
Quartiles based UnderSampling(QUS): A Simple and Novel Method to increase the Classification rate of positives in Imbalanced Datasets
No Thumbnail Available
Date
2018-12-27
Authors
Veni, C. V.Krishna
Rani, T. Sobha
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The main challenge in learning from imbalanced datasets is the presence of a large set of training examples available for the negatives(majority class instances), and very few positives(minority class instances). This may result in a good overall performance of the classifier even though there is a huge red uction in the classification rate of positives. Quartiles based UnderSampling(QUS) method proposed in this paper, addresses the above problem in a simple way. That is balancing the dataset by selecting the negatives based on their similarity with respect to 5 quartiles: minimum, quartile1(Q1), median, quartile3(Q3) and maximum. Intention is to reduce the influence of excessive negatives on the classifier, which may bias it towards a better negatives classification otherwise. An advantage of this undersampling method is parameter independence and gives better results compared to the state of the art methods. The proposed method is tested on kNN (k Nearest Neighbour) classifier and empirical results improve the classification rate of positives than the original unprocessed imbalanced dataset.
Description
Keywords
Classification,
Clustering,
Imbalance,
Stratified Sampling,
UnderSampling
Citation
2017 9th International Conference on Advances in Pattern Recognition, ICAPR 2017