Robust Domain Adaptation Using Active Learning

dc.contributor.advisorVilalta, Ricardo
dc.contributor.committeeMemberEick, Christoph F.
dc.contributor.committeeMemberChen, Guoning
dc.contributor.committeeMemberMahabal, Ashish
dc.creatorDhar Gupta, Kinjal 1982-
dc.creator.orcid0000-0001-9498-6365
dc.date.accessioned2018-11-30T19:20:05Z
dc.date.available2018-11-30T19:20:05Z
dc.date.createdAugust 2016
dc.date.issued2016-08
dc.date.submittedAugust 2016
dc.date.updated2018-11-30T19:20:06Z
dc.description.abstractTraditional machine learning algorithms assume training and test datasets are generated from the same underlying distribution, which is not true for most real-world datasets. As a result, a model trained on the training dataset fails to produce good classification accuracy on the test dataset. One way to mitigate this problem is to use domain adaptation techniques; these techniques build a new model on the unlabeled test dataset (target dataset) by transferring information from a related but labeled training dataset, (source dataset) even when their underlying distributions are different. One other important issue is that in domain adaptation, there is no allowance for obtaining class labels on the test dataset during the training phase. This issue can be handled by active learning techniques that assume the existence of a budget that can be used to label instances on the target domain. Active learning finds the most informative instances of the test dataset that can be labeled by the expert to get a better classification accuracy on the unlabeled test dataset. The goal of this research is to build an optimal classifier on the target dataset by using information related to model complexity. We propose a novel domain adaptation technique using active learning to find the optimal value of a parameter of a class of models that yields the best classifier on the target dataset without assuming the equivalence of the class-conditional probabilities across the domains, unlike other domain adaptation methods. This research also proposes a novel data-alignment technique that allows the use of the source model directly on the target if the distributions differ due to a linear shift, thus avoiding building a complete new classifier on the target domain. Empirical results show that our methods yield better classification accuracy than the state-of-art methods.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: Vilalta, Ricardo, Kinjal Dhar Gupta, and Lucas Macri. "A machine learning approach to Cepheid variable star classification using data alignment and maximum likelihood." Astronomy and Computing 2 (2013): 46-53. And in: Vilalta, Ricardo, Kinjal Dhar Gupta, and Lucas Macri. "Domain adaptation under data misalignment: An application to cepheid variable star classification." In 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 3660-3665. IEEE, 2014. And in: Vilalta, Ricardo, Kinjal Dhar Gupta, and Ashish Mahabal. "Star classification under data variability: an emerging challenge in astroinformatics." In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 241-244. Springer, Cham, 2015. And in: Gupta, Kinjal Dhar, Ricardo Vilalta, Vicken Asadourian, and Lucas Macri. "Adapting Predictive Models for Cepheid Variable Star Classification Using Linear Regression and Maximum Likelihood." Proceedings of the International Astronomical Union 10, no. S306 (2014): 319-321.
dc.identifier.urihttp://hdl.handle.net/10657/3520
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectDomain adaptation
dc.subjectActive learning
dc.subjectMachine learning
dc.subjectModel Complexity
dc.titleRobust Domain Adaptation Using Active Learning
dc.type.dcmiText
dc.type.genreThesis
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DHARGUPTA-DISSERTATION-2016.pdf
Size:
814.34 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.82 KB
Format:
Plain Text
Description: