Feature Selection in Classification Tasks

Date

2019-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Feature subset selection is an essential pre-processing task in machine learning and pattern recognition. In supervised learning, the goal of feature selection is to select the smallest subset of features that can predict the target class with high generalization performance, e.g., with high accuracy. Moreover, feature selection can also help to avoid over-fitting, reduce computational costs, and shorten training time. In this study, we introduce a new filter-based feature selection algorithm named EBFS, that is inspired by Relieff, a well-known distance-based method. Like Relieff, we use the nearest neighborhood of each instance to find the local weight of each feature. Unlike Relieff -- and other filter-based methods that focus on mutual information, conditional mutual information, or interaction information -- this study looks at the entropy of feature values in local neighborhoods effectively capturing information not used in previous distance-based methods. We introduce a new heterogeneous ensemble method based on Relieff that varies the size of the neighborhood and uses statistical hypothesis tests to find the number of relevant features. Both algorithms were tested on multiple datasets; results show the effectiveness of our approach when compared to other well-known methods.

Description

Keywords

Machine learning, Feature selection

Citation

Portions of this document appear in: Pisheh, F., and Vilalta, R. Filter-Based Information-Theoretic Feature Selection. In Proceedings of 3rd International Conference on Advances in Artificial Intelligence (2019)