Dataset Modification To Improve Machine Learning Algorithm Performance And Speed

dc.contributor.advisorVilalta, Ricardo
dc.contributor.committeeMemberShi, Weidong
dc.contributor.committeeMemberJosić, Krešimir
dc.creatorAhmed, Owais 1985- 2014
dc.description.abstractWe propose two pre-processing steps to classification that apply convex hull-based algorithms to the training set to help improve the performance and speed of classification. The Class Reconstruction algorithm uses a clustering algorithm combined with a convex hull-based approach that re-labels the dataset with a new and expanded class structure. We demonstrate how this performance-improvement algorithm helps boost the accuracy results of Naive Bayes in some, but not all, cases that use real-world datasets. The Class Size Reduction approach uses a clustering algorithm as well, followed by collecting all the clusters’ convex hulls to create a new, smaller dataset. This dataset allows for training a Support Vector Machine much faster. We also demonstrate the improvement in classification speed using this algorithm on several real-world datasets. The improvement in this case is more significant and consistent, with only a few cases where the accuracy dropped. The approaches for both projects are specially applicable to datasets that are characterized by a high number of clusters.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectConvex hull
dc.subjectData decomposition
dc.subjectClass relabeling
dc.subjectSVM optimization
dc.subjectNaïve Bayes
dc.subjectPerformance improvement
dc.subject.lcshComputer science
dc.titleDataset Modification To Improve Machine Learning Algorithm Performance And Speed
dc.type.genreThesis of Natural Sciences and Mathematics Science, Department of Science of Houston of Science


Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
1006.51 KB
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
1.84 KB
Plain Text