Feature Selection in Classification Tasks

dc.contributor.advisorVilalta, Ricardo
dc.contributor.committeeMemberEick, Christoph F.
dc.contributor.committeeMemberHuang, Stephen
dc.contributor.committeeMemberGuo, Hongyu
dc.creatorPisheh, Farinaz Zahra 1983-
dc.creator.orcid0000-0002-7350-0036
dc.date.accessioned2019-12-17T20:19:56Z
dc.date.createdDecember 2019
dc.date.issued2019-12
dc.date.submittedDecember 2019
dc.date.updated2019-12-17T20:19:56Z
dc.description.abstractFeature subset selection is an essential pre-processing task in machine learning and pattern recognition. In supervised learning, the goal of feature selection is to select the smallest subset of features that can predict the target class with high generalization performance, e.g., with high accuracy. Moreover, feature selection can also help to avoid over-fitting, reduce computational costs, and shorten training time. In this study, we introduce a new filter-based feature selection algorithm named EBFS, that is inspired by Relieff, a well-known distance-based method. Like Relieff, we use the nearest neighborhood of each instance to find the local weight of each feature. Unlike Relieff -- and other filter-based methods that focus on mutual information, conditional mutual information, or interaction information -- this study looks at the entropy of feature values in local neighborhoods effectively capturing information not used in previous distance-based methods. We introduce a new heterogeneous ensemble method based on Relieff that varies the size of the neighborhood and uses statistical hypothesis tests to find the number of relevant features. Both algorithms were tested on multiple datasets; results show the effectiveness of our approach when compared to other well-known methods.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: Pisheh, F., and Vilalta, R. Filter-Based Information-Theoretic Feature Selection. In Proceedings of 3rd International Conference on Advances in Artificial Intelligence (2019)
dc.identifier.urihttps://hdl.handle.net/10657/5598
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectMachine learning
dc.subjectFeature selection
dc.titleFeature Selection in Classification Tasks
dc.type.dcmiText
dc.type.genreThesis
local.embargo.lift2021-12-01
local.embargo.terms2021-12-01
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
PISHEH-DISSERTATION-2019.pdf
Size:
1005.97 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.43 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: