Instance Ranking Using Data Complexity Measures for Training Set Selection

dc.contributor.author Alam, Junaid
dc.contributor.author Sobha Rani, T.
dc.date.accessioned 2022-03-27T05:50:47Z
dc.date.available 2022-03-27T05:50:47Z
dc.date.issued 2019-01-01
dc.description.abstract A classifier’s performance is dependent on the training set provided for the training. Hence training set selection holds an important place in the classification task. This training set selection plays an important role in improving the performance of the classifier and reducing the time taken for training. This can be done using various methods like algorithms, data-handling techniques, cost-sensitive methods, ensembles and so on. In this work, one of the data complexity measures, Maximum Fisher’s discriminant ratio (F1), has been used to determine the good training instances. This measure discriminates any two classes using a specific feature by comparing the class means and variances. This measure in particular provides the overlap between the classes. In the first phase, F1 of the whole data set is calculated. After that, F1 using leave-one-out method is computed to rank each of the instances. Finally, the instances that lower the F1 value are all removed as a batch from the data set. According to F1, a small value represents a strong overlap between the classes. Therefore if those instances that cause more overlap are removed then overlap will reduce further. Empirically demonstrated in this work, the efficacy of the proposed reduction algorithm (DRF1) using 4 different classifiers (Random Forest, Decision Tree-C5.0, SVM and kNN) and 6 data sets (Pima, Musk, Sonar, Winequality(R and W) and Wisconsin). The results confirm that the DRF1 leads to a promising improvement in kappa statistics and classification accuracy with the training set selection using data complexity measure. Approximately 18–50% reduction is achieved. There is a huge reduction of training time also.
dc.identifier.citation Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). v.11941 LNCS
dc.identifier.issn 03029743
dc.identifier.uri 10.1007/978-3-030-34869-4_20
dc.identifier.uri http://link.springer.com/10.1007/978-3-030-34869-4_20
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8246
dc.subject Batch removal
dc.subject Classification
dc.subject Instance ranking
dc.subject Kappa statistics
dc.subject Maximum Fisher’s discriminant ratio
dc.title Instance Ranking Using Data Complexity Measures for Training Set Selection
dc.type Book Series. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: