Reduction Strategies to Tackle Class Imbalance in Datasets
Reduction Strategies to Tackle Class Imbalance in Datasets
Files
Date
2021-07-28
Authors
Krishnaveni, C.V.
Journal Title
Journal ISSN
Volume Title
Publisher
University of Hyderabad
Abstract
Banking, retail, financial, scientific and telecommunications and various other sectors have
all been using data mining technologies, for processing massive amounts of data measured
in zeta bytes. While this massive amount of data is useful, datasets have to be processed
effectively to perform predictive and inferential forecasts for a target population. The Class
imbalance, where there are fewer instances of a class than the number of instances in other
class/classes in a dataset has posed challenges to the traditional classifiers. Traditional
classifiers fail to handle the imbalanced datasets due to inherent assumptions made in
designing them. The distribution of classes within the dataset has a direct impact on the
classifier/model performance. One of the proven practices to address this problem is to
balance the classes in the training data sets. Main goals of the balancing are increasing
sensitivity, selecting representative samples from the majority class, maintaining
trade-off between Majority Class and Minority Class prediction rates.