Robust Domain Adaptation Using Active Learning

Date

2016-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Traditional machine learning algorithms assume training and test datasets are generated from the same underlying distribution, which is not true for most real-world datasets. As a result, a model trained on the training dataset fails to produce good classification accuracy on the test dataset. One way to mitigate this problem is to use domain adaptation techniques; these techniques build a new model on the unlabeled test dataset (target dataset) by transferring information from a related but labeled training dataset, (source dataset) even when their underlying distributions are different. One other important issue is that in domain adaptation, there is no allowance for obtaining class labels on the test dataset during the training phase. This issue can be handled by active learning techniques that assume the existence of a budget that can be used to label instances on the target domain. Active learning finds the most informative instances of the test dataset that can be labeled by the expert to get a better classification accuracy on the unlabeled test dataset.

The goal of this research is to build an optimal classifier on the target dataset by using information related to model complexity. We propose a novel domain adaptation technique using active learning to find the optimal value of a parameter of a class of models that yields the best classifier on the target dataset without assuming the equivalence of the class-conditional probabilities across the domains, unlike other domain adaptation methods. This research also proposes a novel data-alignment technique that allows the use of the source model directly on the target if the distributions differ due to a linear shift, thus avoiding building a complete new classifier on the target domain. Empirical results show that our methods yield better classification accuracy than the state-of-art methods.

Description

Keywords

Domain adaptation, Active learning, Machine learning, Model Complexity

Citation

Portions of this document appear in: Vilalta, Ricardo, Kinjal Dhar Gupta, and Lucas Macri. "A machine learning approach to Cepheid variable star classification using data alignment and maximum likelihood." Astronomy and Computing 2 (2013): 46-53. And in: Vilalta, Ricardo, Kinjal Dhar Gupta, and Lucas Macri. "Domain adaptation under data misalignment: An application to cepheid variable star classification." In 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 3660-3665. IEEE, 2014. And in: Vilalta, Ricardo, Kinjal Dhar Gupta, and Ashish Mahabal. "Star classification under data variability: an emerging challenge in astroinformatics." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 241-244. Springer, Cham, 2015. And in: Gupta, Kinjal Dhar, Ricardo Vilalta, Vicken Asadourian, and Lucas Macri. "Adapting Predictive Models for Cepheid Variable Star Classification Using Linear Regression and Maximum Likelihood." Proceedings of the International Astronomical Union 10, no. S306 (2014): 319-321.