Choosing the Right Kernel A Meta-Learning Approach to Kernel Selection in Support Vector Machines
Valerio Molina, Roberto 1983-
MetadataShow full item record
In recent years Support Vector Machines (SVM) have gained increasing popularity over other classification algorithms due to their ability to produce a flexible boundary over non-linearly separable datasets. Such an ability is feasible thanks to the kernel trick. The kernel trick allows SVMs to perform an implicit transformation of the non-linearly separable original input space into a higher dimensional feature space where a linear separation of the dataset can be found. By creating an implicit transformation of the original feature space we gain efficiency in terms of time complexity. However, we lose information since we do not know what the feature space looks like, but we obtain relative positions in the feature space thanks to the kernel function used to perform this transformation. Since different kernel functions yield different transformations of the feature space, there is a need for a mechanism that selects the best kernel function for a specific problem. Previous work has focused on generating metrics from the kernel matrix (a pairwise matrix that stores the relative positions of all the pairs of points). Three metrics have been used to extract information from the kernel matrix: Fisher's discriminant, Bregman's divergence and Homoscedasticity analysis, which even when combined together do not provide enough prediction power to perform kernel selection. By introducing new meta-features, Distance Ratio (capturing inter-class and intra-class distances in the feature space) and Class Similarity (computing inter-class and intra-class similarity in the feature space), we yield substantial improvements to the kernel selection process.