Dataset Modification To Improve Machine Learning Algorithm Performance And Speed

dc.contributor.advisorVilalta, Ricardo
dc.contributor.committeeMemberShi, Weidong
dc.contributor.committeeMemberJosić, Krešimir
dc.creatorAhmed, Owais 1985-
dc.date.accessioned2015-02-11T18:35:34Z
dc.date.available2015-02-11T18:35:34Z
dc.date.createdMay 2014
dc.date.issued2014-05
dc.date.updated2015-02-11T18:35:34Z
dc.description.abstractWe propose two pre-processing steps to classification that apply convex hull-based algorithms to the training set to help improve the performance and speed of classification. The Class Reconstruction algorithm uses a clustering algorithm combined with a convex hull-based approach that re-labels the dataset with a new and expanded class structure. We demonstrate how this performance-improvement algorithm helps boost the accuracy results of Naive Bayes in some, but not all, cases that use real-world datasets. The Class Size Reduction approach uses a clustering algorithm as well, followed by collecting all the clusters’ convex hulls to create a new, smaller dataset. This dataset allows for training a Support Vector Machine much faster. We also demonstrate the improvement in classification speed using this algorithm on several real-world datasets. The improvement in this case is more significant and consistent, with only a few cases where the accuracy dropped. The approaches for both projects are specially applicable to datasets that are characterized by a high number of clusters.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10657/901
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectConvex hull
dc.subjectData decomposition
dc.subjectClass relabeling
dc.subjectSVM optimization
dc.subjectNaïve Bayes
dc.subjectPerformance improvement
dc.subject.lcshComputer science
dc.titleDataset Modification To Improve Machine Learning Algorithm Performance And Speed
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AHMED-THESIS-2014.pdf
Size:
1006.51 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.84 KB
Format:
Plain Text
Description: