Dataset Modification To Improve Machine Learning Algorithm Performance And Speed
dc.contributor.advisor | Vilalta, Ricardo | |
dc.contributor.committeeMember | Shi, Weidong | |
dc.contributor.committeeMember | Josić, Krešimir | |
dc.creator | Ahmed, Owais 1985- | |
dc.date.accessioned | 2015-02-11T18:35:34Z | |
dc.date.available | 2015-02-11T18:35:34Z | |
dc.date.created | May 2014 | |
dc.date.issued | 2014-05 | |
dc.date.updated | 2015-02-11T18:35:34Z | |
dc.description.abstract | We propose two pre-processing steps to classification that apply convex hull-based algorithms to the training set to help improve the performance and speed of classification. The Class Reconstruction algorithm uses a clustering algorithm combined with a convex hull-based approach that re-labels the dataset with a new and expanded class structure. We demonstrate how this performance-improvement algorithm helps boost the accuracy results of Naive Bayes in some, but not all, cases that use real-world datasets. The Class Size Reduction approach uses a clustering algorithm as well, followed by collecting all the clusters’ convex hulls to create a new, smaller dataset. This dataset allows for training a Support Vector Machine much faster. We also demonstrate the improvement in classification speed using this algorithm on several real-world datasets. The improvement in this case is more significant and consistent, with only a few cases where the accuracy dropped. The approaches for both projects are specially applicable to datasets that are characterized by a high number of clusters. | |
dc.description.department | Computer Science, Department of | |
dc.format.digitalOrigin | born digital | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | http://hdl.handle.net/10657/901 | |
dc.language.iso | eng | |
dc.rights | The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s). | |
dc.subject | Convex hull | |
dc.subject | Data decomposition | |
dc.subject | Class relabeling | |
dc.subject | SVM optimization | |
dc.subject | Naïve Bayes | |
dc.subject | Performance improvement | |
dc.subject.lcsh | Computer science | |
dc.title | Dataset Modification To Improve Machine Learning Algorithm Performance And Speed | |
dc.type.dcmi | Text | |
dc.type.genre | Thesis | |
thesis.degree.college | College of Natural Sciences and Mathematics | |
thesis.degree.department | Computer Science, Department of | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | University of Houston | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science |