Reduction Strategies to Tackle Class Imbalance in Datasets

Thumbnail Image
Date
2021-07-28
Authors
Krishnaveni, C.V.
Journal Title
Journal ISSN
Volume Title
Publisher
University of Hyderabad
Abstract
Banking, retail, financial, scientific and telecommunications and various other sectors have all been using data mining technologies, for processing massive amounts of data measured in zeta bytes. While this massive amount of data is useful, datasets have to be processed effectively to perform predictive and inferential forecasts for a target population. The Class imbalance, where there are fewer instances of a class than the number of instances in other class/classes in a dataset has posed challenges to the traditional classifiers. Traditional classifiers fail to handle the imbalanced datasets due to inherent assumptions made in designing them. The distribution of classes within the dataset has a direct impact on the classifier/model performance. One of the proven practices to address this problem is to balance the classes in the training data sets. Main goals of the balancing are increasing sensitivity, selecting representative samples from the majority class, maintaining trade-off between Majority Class and Minority Class prediction rates.
Description
Keywords
Citation