Adversarial and Non-Adversarial Approaches for Imbalanced Data Classification

Mansourifar, Hadi

Adversarial and Non-Adversarial Approaches for Imbalanced Data Classification

dc.contributor.committeeMember	Shi, Weidong
dc.contributor.committeeMember	Vilalta, Ricardo
dc.contributor.committeeMember	Chen, Guoning
dc.contributor.committeeMember	Chen, Lin
dc.creator	Mansourifar, Hadi
dc.date.accessioned	2023-05-25T18:21:08Z
dc.date.created	May 2022
dc.date.issued	2022-03-22
dc.date.updated	2023-05-25T18:21:09Z
dc.description.abstract	Due to significant impact of imbalanced classification in many domain of computation from security to medical science, any step to address the related challenges like lost diversity, high false positive rate or uncertainty is closely monitored by scientific community. Due to surge of adversarial approaches like GANs, non-adversarial methods like Synthetic Minority Over-Sampling Technique (SMOTE) gradually gain less and less attention every day. In this dissertation, we propose not only novel adversarial and non-adversarial approaches, but we investigate how Adversarial and Non-Adversarial approaches can meet each other. Beyond that, we propose a novel method to make the evaluation of adversarial approaches explainable which is called Real-Fake Validation Loss (RFVL). The data driven approaches to tackle the imbalanced data classification suffer two major problems: (i) lack of diversity (ii) uncertainty. In this research, we propose a set of novel approaches to address the mentioned problems: First, we propose Cross-Concatenation, the first projection-based method to address imbalanced data classification problem. Cross-Concatenation is the first projection method which can balance the size of both minority and majority classes. We prove that, Cross-Concatenation can create larger margins with better class separation. Despite SMOTE and its variations, Cross-Concatenation is not based on random procedures. Thus, in case of running it on fixed training and test data the same efficiency results are obtained. This stability is one of the most important advantages of Cross-Concatenation versus SMOTE. Besides, our experimental results show the competitive Cross-Concatenation results versus SMOTE and its variants as the most popular over-sampling approaches in terms of F1 score and AUC in majority of test cases. Second, we introduced a new concept called virtual big data. Virtual big data is high dimensional version of original training data which is generated by concatenation of c different original instances. This technique can increase the number of training data from N to C(N, c). We prove that, the curse of dimensionality of virtual big data can alleviate the vanishing generator gradient problem.
dc.description.department	Computer Science, Department of
dc.format.digitalOrigin	born digital
dc.format.mimetype	application/pdf
dc.identifier.citation	Portions of this document appear in: Mansourifar, Hadi, Lin Chen, and Weidong Shi. "Virtual big data for GAN based data augmentation." In 2019 IEEE International Conference on Big Data (Big Data), pp. 1478-1487. IEEE, 2019; and in: Mansourifar, Hadi, and Weidong Shi. "Cross-Concatenation: Tackling Uncertainty in Imbalanced Big Data Classification." In 2021 IEEE International Conference on Big Data (Big Data), pp. 867-875. IEEE, 2021; and in: Kasichainula, Keshav, Hadi Mansourifar, and Weidong Shi. "Poisoning Attacks via Generative Adversarial Text to Image Synthesis." In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 158-165. IEEE, 2021.
dc.identifier.uri	https://hdl.handle.net/10657/14262
dc.language.iso	eng
dc.rights	The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subject	Imbalanced Classification
dc.subject	Over-Sampling
dc.subject	Generative Adversarial Neural Networks
dc.title	Adversarial and Non-Adversarial Approaches for Imbalanced Data Classification
dc.type.dcmi	Text
dc.type.genre	Thesis
dcterms.accessRights	The full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period.
local.embargo.lift	2024-05-01
local.embargo.terms	2024-05-01
thesis.degree.college	College of Natural Sciences and Mathematics
thesis.degree.department	Computer Science, Department of
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Houston
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 4.43 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 1.82 KB
Format:: Plain Text
Description:

Download

Collections

Published ETD Collection