Dataset Modification To Improve Machine Learning Algorithm Performance And Speed



Journal Title

Journal ISSN

Volume Title



We propose two pre-processing steps to classification that apply convex hull-based algorithms to the training set to help improve the performance and speed of classification. The Class Reconstruction algorithm uses a clustering algorithm combined with a convex hull-based approach that re-labels the dataset with a new and expanded class structure. We demonstrate how this performance-improvement algorithm helps boost the accuracy results of Naive Bayes in some, but not all, cases that use real-world datasets. The Class Size Reduction approach uses a clustering algorithm as well, followed by collecting all the clusters’ convex hulls to create a new, smaller dataset. This dataset allows for training a Support Vector Machine much faster. We also demonstrate the improvement in classification speed using this algorithm on several real-world datasets. The improvement in this case is more significant and consistent, with only a few cases where the accuracy dropped. The approaches for both projects are specially applicable to datasets that are characterized by a high number of clusters.



Convex hull, Data decomposition, Class relabeling, SVM optimization, Naïve Bayes, Performance improvement