Proactive Defense through Automated Attack Generation: A Multi-pronged Study of Generated Deceptive Content

dc.contributor.advisorVerma, Rakesh M.
dc.contributor.committeeMemberLeiss, Ernst L.
dc.contributor.committeeMemberSolorio, Thamar
dc.contributor.committeeMemberGervás Gomez-Navarro, Pablo
dc.creatorDas, Avisha
dc.date.accessioned2021-08-06T19:36:03Z
dc.date.createdDecember 2020
dc.date.issued2020-12
dc.date.submittedDecember 2020
dc.date.updated2021-08-06T19:36:04Z
dc.description.abstractSocial engineering attacks are a security threat - attacks like phishing, email masquerading, etc. are common examples of such attacks where a perpetrator impersonates as a legitimate entity to steal an unknowing victim's digital identity. However, despite having a higher probability of success, executing such an attack can be costly in terms of time and manual labor. With the advancements in machine learning and natural language processing techniques, the attackers can now use more sophisticated methods to evade detection. Deep neural learners are capable of natural text generation when trained on huge amounts of written textual content. While these techniques have been tested in creative content (stories) generation based tasks, they have been abused to generate fake content (fake news) as well. In a proactive scenario, the defender presumes that attackers would resort to sophisticated yet automated methods of attack vector generation. However, the application of neural text generation methods to email generation is fairly challenging owing to the presence of noise or sparsity in emails and the diversity in email writing style. Moreover, the evaluation and detection of generated content is a challenging and cumbersome task and current automated metrics do not provide the best possible alternative. We analyze the task of automated content generation for two tasks: (a) creative content or story generation from writing prompts; and (b) generation of emails from given subject prompts for specific intents. We split the proposed analysis for each task into three defined parts - (i) content (story/email) generation; (ii) fine-tuning and improving upon generated content; and (iii) content evaluation. Apart from testing the baselines like word-based Recurrent Neural Networks and pre-trained and fine-tuned transformer language models, we propose HiGen - a hierarchical architecture that leverages the use of a generative language model by improving upon the generated content with the use of sentence embeddings given a prior conditioning prompt. Finally, we compare the linguistic quality of the generated text to human authored text using a set of automated metrics. We also corroborate our findings with a human-based user study - to ascertain how well the metrics can distinguish between writing patterns. Moreover, we explore if there exists a difference in system performance with respect to the genre of text generation - story vs. emails. We see the overall improvement in sentence coherence in content generated by HiGen architecture.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.citationPortions of this document appear in: Das, Avisha, and Rakesh M. Verma. "Can machines tell stories? A comparative study of deep neural language models and metrics." IEEE Access 8 (2020): 181258-181292. And in: Das, Avisha, and Rakesh Verma. "Automated email generation for targeted attacks using natural language." arXiv preprint arXiv:1908.06893 (2019).
dc.identifier.urihttps://hdl.handle.net/10657/8029
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectNatural Language Generation
dc.subjectDeep Learning
dc.subjectTransformer Architecture
dc.subjectDeep Neural Network
dc.subjectEmail Generation
dc.subjectStory Generation
dc.subjectLanguage Modeling
dc.subjectCoherenece Metrics
dc.titleProactive Defense through Automated Attack Generation: A Multi-pronged Study of Generated Deceptive Content
dc.type.dcmiText
dc.type.genreThesis
local.embargo.lift2022-12-01
local.embargo.terms2022-12-01
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DAS-DISSERTATION-2020.pdf
Size:
8.4 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.43 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: