Feature Selection in Classification Tasks



Journal Title

Journal ISSN

Volume Title



Feature subset selection is an essential pre-processing task in machine learning and pattern recognition. In supervised learning, the goal of feature selection is to select the smallest subset of features that can predict the target class with high generalization performance, e.g., with high accuracy. Moreover, feature selection can also help to avoid over-fitting, reduce computational costs, and shorten training time. In this study, we introduce a new filter-based feature selection algorithm named EBFS, that is inspired by Relieff, a well-known distance-based method. Like Relieff, we use the nearest neighborhood of each instance to find the local weight of each feature. Unlike Relieff -- and other filter-based methods that focus on mutual information, conditional mutual information, or interaction information -- this study looks at the entropy of feature values in local neighborhoods effectively capturing information not used in previous distance-based methods. We introduce a new heterogeneous ensemble method based on Relieff that varies the size of the neighborhood and uses statistical hypothesis tests to find the number of relevant features. Both algorithms were tested on multiple datasets; results show the effectiveness of our approach when compared to other well-known methods.



Machine learning, Feature selection


Portions of this document appear in: Pisheh, F., and Vilalta, R. Filter-Based Information-Theoretic Feature Selection. In Proceedings of 3rd International Conference on Advances in Artificial Intelligence (2019)