Adversarial and Non-Adversarial Approaches for Imbalanced Data Classification

dc.contributor.committeeMemberShi, Weidong
dc.contributor.committeeMemberVilalta, Ricardo
dc.contributor.committeeMemberChen, Guoning
dc.contributor.committeeMemberChen, Lin
dc.creatorMansourifar, Hadi
dc.date.accessioned2023-05-25T18:21:08Z
dc.date.createdMay 2022
dc.date.issued2022-03-22
dc.date.updated2023-05-25T18:21:09Z
dc.description.abstractDue to significant impact of imbalanced classification in many domain of computation from security to medical science, any step to address the related challenges like lost diversity, high false positive rate or uncertainty is closely monitored by scientific community. Due to surge of adversarial approaches like GANs, non-adversarial methods like Synthetic Minority Over-Sampling Technique (SMOTE) gradually gain less and less attention every day. In this dissertation, we propose not only novel adversarial and non-adversarial approaches, but we investigate how Adversarial and Non-Adversarial approaches can meet each other. Beyond that, we propose a novel method to make the evaluation of adversarial approaches explainable which is called Real-Fake Validation Loss (RFVL). The data driven approaches to tackle the imbalanced data classification suffer two major problems: (i) lack of diversity (ii) uncertainty. In this research, we propose a set of novel approaches to address the mentioned problems: First, we propose Cross-Concatenation, the first projection-based method to address imbalanced data classification problem. Cross-Concatenation is the first projection method which can balance the size of both minority and majority classes. We prove that, Cross-Concatenation can create larger margins with better class separation. Despite SMOTE and its variations, Cross-Concatenation is not based on random procedures. Thus, in case of running it on fixed training and test data the same efficiency results are obtained. This stability is one of the most important advantages of Cross-Concatenation versus SMOTE. Besides, our experimental results show the competitive Cross-Concatenation results versus SMOTE and its variants as the most popular over-sampling approaches in terms of F1 score and AUC in majority of test cases. Second, we introduced a new concept called virtual big data. Virtual big data is high dimensional version of original training data which is generated by concatenation of c different original instances. This technique can increase the number of training data from N to C(N, c). We prove that, the curse of dimensionality of virtual big data can alleviate the vanishing generator gradient problem.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: Mansourifar, Hadi, Lin Chen, and Weidong Shi. "Virtual big data for GAN based data augmentation." In 2019 IEEE International Conference on Big Data (Big Data), pp. 1478-1487. IEEE, 2019; and in: Mansourifar, Hadi, and Weidong Shi. "Cross-Concatenation: Tackling Uncertainty in Imbalanced Big Data Classification." In 2021 IEEE International Conference on Big Data (Big Data), pp. 867-875. IEEE, 2021; and in: Kasichainula, Keshav, Hadi Mansourifar, and Weidong Shi. "Poisoning Attacks via Generative Adversarial Text to Image Synthesis." In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 158-165. IEEE, 2021.
dc.identifier.urihttps://hdl.handle.net/10657/14262
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectImbalanced Classification
dc.subjectOver-Sampling
dc.subjectGenerative Adversarial Neural Networks
dc.titleAdversarial and Non-Adversarial Approaches for Imbalanced Data Classification
dc.type.dcmiText
dc.type.genreThesis
dcterms.accessRightsThe full text of this item is not available at this time because the student has placed this item under an embargo for a period of time. The Libraries are not authorized to provide a copy of this work during the embargo period.
local.embargo.lift2024-05-01
local.embargo.terms2024-05-01
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MANSOURIFAR-DISSERTATION-2022.pdf
Size:
8.08 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.43 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.82 KB
Format:
Plain Text
Description: