Proactive Defense through Automated Attack Generation: A Multi-pronged Study of Generated Deceptive Content

Das, Avisha

Proactive Defense through Automated Attack Generation: A Multi-pronged Study of Generated Deceptive Content

dc.contributor.advisor	Verma, Rakesh M.
dc.contributor.committeeMember	Leiss, Ernst L.
dc.contributor.committeeMember	Solorio, Thamar
dc.contributor.committeeMember	Gervás Gomez-Navarro, Pablo
dc.creator	Das, Avisha
dc.date.accessioned	2021-08-06T19:36:03Z
dc.date.created	December 2020
dc.date.issued	2020-12
dc.date.submitted	December 2020
dc.date.updated	2021-08-06T19:36:04Z
dc.description.abstract	Social engineering attacks are a security threat - attacks like phishing, email masquerading, etc. are common examples of such attacks where a perpetrator impersonates as a legitimate entity to steal an unknowing victim's digital identity. However, despite having a higher probability of success, executing such an attack can be costly in terms of time and manual labor. With the advancements in machine learning and natural language processing techniques, the attackers can now use more sophisticated methods to evade detection. Deep neural learners are capable of natural text generation when trained on huge amounts of written textual content. While these techniques have been tested in creative content (stories) generation based tasks, they have been abused to generate fake content (fake news) as well. In a proactive scenario, the defender presumes that attackers would resort to sophisticated yet automated methods of attack vector generation. However, the application of neural text generation methods to email generation is fairly challenging owing to the presence of noise or sparsity in emails and the diversity in email writing style. Moreover, the evaluation and detection of generated content is a challenging and cumbersome task and current automated metrics do not provide the best possible alternative. We analyze the task of automated content generation for two tasks: (a) creative content or story generation from writing prompts; and (b) generation of emails from given subject prompts for specific intents. We split the proposed analysis for each task into three defined parts - (i) content (story/email) generation; (ii) fine-tuning and improving upon generated content; and (iii) content evaluation. Apart from testing the baselines like word-based Recurrent Neural Networks and pre-trained and fine-tuned transformer language models, we propose HiGen - a hierarchical architecture that leverages the use of a generative language model by improving upon the generated content with the use of sentence embeddings given a prior conditioning prompt. Finally, we compare the linguistic quality of the generated text to human authored text using a set of automated metrics. We also corroborate our findings with a human-based user study - to ascertain how well the metrics can distinguish between writing patterns. Moreover, we explore if there exists a difference in system performance with respect to the genre of text generation - story vs. emails. We see the overall improvement in sentence coherence in content generated by HiGen architecture.
dc.description.department	Computer Science, Department of
dc.format.digitalOrigin	born digital
dc.format.mimetype	application/pdf
dc.identifier.citation	Portions of this document appear in: Das, Avisha, and Rakesh M. Verma. "Can machines tell stories? A comparative study of deep neural language models and metrics." IEEE Access 8 (2020): 181258-181292. And in: Das, Avisha, and Rakesh Verma. "Automated email generation for targeted attacks using natural language." arXiv preprint arXiv:1908.06893 (2019).
dc.identifier.uri	https://hdl.handle.net/10657/8029
dc.language.iso	eng
dc.rights	The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subject	Natural Language Generation
dc.subject	Deep Learning
dc.subject	Transformer Architecture
dc.subject	Deep Neural Network
dc.subject	Email Generation
dc.subject	Story Generation
dc.subject	Language Modeling
dc.subject	Coherenece Metrics
dc.title	Proactive Defense through Automated Attack Generation: A Multi-pronged Study of Generated Deceptive Content
dc.type.dcmi	Text
dc.type.genre	Thesis
local.embargo.lift	2022-12-01
local.embargo.terms	2022-12-01
thesis.degree.college	College of Natural Sciences and Mathematics
thesis.degree.department	Computer Science, Department of
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Houston
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: DAS-DISSERTATION-2020.pdf
Size:: 8.4 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 4.43 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 1.81 KB
Format:: Plain Text
Description:

Download

Collections

Published ETD Collection