Robust Domain Adaptation Using Active Learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Traditional machine learning algorithms assume training and test datasets are generated from the same underlying distribution, which is not true for most real-world datasets. As a result, a model trained on the training dataset fails to produce good classification accuracy on the test dataset. One way to mitigate this problem is to use domain adaptation techniques; these techniques build a new model on the unlabeled test dataset (target dataset) by transferring information from a related but labeled training dataset, (source dataset) even when their underlying distributions are different. One other important issue is that in domain adaptation, there is no allowance for obtaining class labels on the test dataset during the training phase. This issue can be handled by active learning techniques that assume the existence of a budget that can be used to label instances on the target domain. Active learning finds the most informative instances of the test dataset that can be labeled by the expert to get a better classification accuracy on the unlabeled test dataset.
The goal of this research is to build an optimal classifier on the target dataset by using information related to model complexity. We propose a novel domain adaptation technique using active learning to find the optimal value of a parameter of a class of models that yields the best classifier on the target dataset without assuming the equivalence of the class-conditional probabilities across the domains, unlike other domain adaptation methods. This research also proposes a novel data-alignment technique that allows the use of the source model directly on the target if the distributions differ due to a linear shift, thus avoiding building a complete new classifier on the target domain. Empirical results show that our methods yield better classification accuracy than the state-of-art methods.