Instance Ranking Using Data Complexity Measures for Training Set Selection

Alam, Junaid; Sobha Rani, T.

Instance Ranking Using Data Complexity Measures for Training Set Selection

dc.contributor.author	Alam, Junaid
dc.contributor.author	Sobha Rani, T.
dc.date.accessioned	2022-03-27T05:50:47Z
dc.date.available	2022-03-27T05:50:47Z
dc.date.issued	2019-01-01
dc.description.abstract	A classifier’s performance is dependent on the training set provided for the training. Hence training set selection holds an important place in the classification task. This training set selection plays an important role in improving the performance of the classifier and reducing the time taken for training. This can be done using various methods like algorithms, data-handling techniques, cost-sensitive methods, ensembles and so on. In this work, one of the data complexity measures, Maximum Fisher’s discriminant ratio (F1), has been used to determine the good training instances. This measure discriminates any two classes using a specific feature by comparing the class means and variances. This measure in particular provides the overlap between the classes. In the first phase, F1 of the whole data set is calculated. After that, F1 using leave-one-out method is computed to rank each of the instances. Finally, the instances that lower the F1 value are all removed as a batch from the data set. According to F1, a small value represents a strong overlap between the classes. Therefore if those instances that cause more overlap are removed then overlap will reduce further. Empirically demonstrated in this work, the efficacy of the proposed reduction algorithm (DRF1) using 4 different classifiers (Random Forest, Decision Tree-C5.0, SVM and kNN) and 6 data sets (Pima, Musk, Sonar, Winequality(R and W) and Wisconsin). The results confirm that the DRF1 leads to a promising improvement in kappa statistics and classification accuracy with the training set selection using data complexity measure. Approximately 18–50% reduction is achieved. There is a huge reduction of training time also.
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). v.11941 LNCS
dc.identifier.issn	03029743
dc.identifier.uri	10.1007/978-3-030-34869-4_20
dc.identifier.uri	http://link.springer.com/10.1007/978-3-030-34869-4_20
dc.identifier.uri	https://dspace.uohyd.ac.in/handle/1/8246
dc.subject	Batch removal
dc.subject	Classification
dc.subject	Instance ranking
dc.subject	Kappa statistics
dc.subject	Maximum Fisher’s discriminant ratio
dc.title	Instance Ranking Using Data Complexity Measures for Training Set Selection
dc.type	Book Series. Conference Paper
dspace.entity.type

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Computer and Information Sciences - Publications