Feature Selection in Classification Tasks
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Feature subset selection is an essential pre-processing task in machine learning and pattern recognition. In supervised learning, the goal of feature selection is to select the smallest subset of features that can predict the target class with high generalization performance, e.g., with high accuracy. Moreover, feature selection can also help to avoid over-fitting, reduce computational costs, and shorten training time. In this study, we introduce a new filter-based feature selection algorithm named EBFS, that is inspired by Relieff, a well-known distance-based method. Like Relieff, we use the nearest neighborhood of each instance to find the local weight of each feature. Unlike Relieff -- and other filter-based methods that focus on mutual information, conditional mutual information, or interaction information -- this study looks at the entropy of feature values in local neighborhoods effectively capturing information not used in previous distance-based methods. We introduce a new heterogeneous ensemble method based on Relieff that varies the size of the neighborhood and uses statistical hypothesis tests to find the number of relevant features. Both algorithms were tested on multiple datasets; results show the effectiveness of our approach when compared to other well-known methods.