Study on Adversarial Robustness of Phishing Email Detection Models



Journal Title

Journal ISSN

Volume Title



Developing robust detection models against phishing emails has long been a main concerns of the cyber defense community. Currently public phishing/legitimate datasets are lack adversarial email examples which keeps the detection models vulnerable. To address this problem, we developed an augmented phishing/legitimate email dataset, utilizing different adversarial text attack techniques. In this work, the emails that can easily transform to adversarial examples and their unique characteristics have been detected and analyzed. Henceforth the models are retrained with adversarial dataset and the results show that ac- curacy from and F1 score of the models have been improved from five to forty percent under attack methods. In another experiment synthetic phishing emails are generated using a fine-tuned GPT-2 model. The detection model has retrained with newly formed dataset and we have observed the accuracy and robustness of the model has not improved under black box attack methods. In our last experiment we proposed a defensive technique to classify adversarial examples to their true labels using K-Nearest Neighbor with 94% accuracy in our prediction.



Adversarial attack, Phishing detection