• Login
    View Item 
    •   Repository Home
    • Electronic Theses and Dissertations (2010 - Present)
    • Published ETD Collection
    • View Item
    •   Repository Home
    • Electronic Theses and Dissertations (2010 - Present)
    • Published ETD Collection
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Adversarial and Non-Adversarial Approaches for Imbalanced Data Classification

    View/Open
    MANSOURIFAR-DISSERTATION-2022.pdf (8.075Mb)
    Date
    2022-03-22
    Author
    Mansourifar, Hadi
    Metadata
    Show full item record
    Abstract
    Due to significant impact of imbalanced classification in many domain of computation from security to medical science, any step to address the related challenges like lost diversity, high false positive rate or uncertainty is closely monitored by scientific community. Due to surge of adversarial approaches like GANs, non-adversarial methods like Synthetic Minority Over-Sampling Technique (SMOTE) gradually gain less and less attention every day. In this dissertation, we propose not only novel adversarial and non-adversarial approaches, but we investigate how Adversarial and Non-Adversarial approaches can meet each other. Beyond that, we propose a novel method to make the evaluation of adversarial approaches explainable which is called Real-Fake Validation Loss (RFVL). The data driven approaches to tackle the imbalanced data classification suffer two major problems: (i) lack of diversity (ii) uncertainty. In this research, we propose a set of novel approaches to address the mentioned problems: First, we propose Cross-Concatenation, the first projection-based method to address imbalanced data classification problem. Cross-Concatenation is the first projection method which can balance the size of both minority and majority classes. We prove that, Cross-Concatenation can create larger margins with better class separation. Despite SMOTE and its variations, Cross-Concatenation is not based on random procedures. Thus, in case of running it on fixed training and test data the same efficiency results are obtained. This stability is one of the most important advantages of Cross-Concatenation versus SMOTE. Besides, our experimental results show the competitive Cross-Concatenation results versus SMOTE and its variants as the most popular over-sampling approaches in terms of F1 score and AUC in majority of test cases. Second, we introduced a new concept called virtual big data. Virtual big data is high dimensional version of original training data which is generated by concatenation of c different original instances. This technique can increase the number of training data from N to C(N, c). We prove that, the curse of dimensionality of virtual big data can alleviate the vanishing generator gradient problem.
    URI
    https://hdl.handle.net/10657/14262
    Collections
    • Published ETD Collection

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    TDL
    Theme by 
    Atmire NV
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsDepartmentsTitlesSubjectsThis CollectionBy Issue DateAuthorsDepartmentsTitlesSubjects

    My Account

    Login

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    TDL
    Theme by 
    Atmire NV