Automatic Feature Extraction for Phishing Email Detection

Crane, Devin R. 1988-

Automatic Feature Extraction for Phishing Email Detection

dc.contributor.committeeMember	Verma, Rakesh M.
dc.contributor.committeeMember	Gnawali, Omprakash
dc.contributor.committeeMember	Conklin, Wm. Arthur
dc.creator	Crane, Devin R. 1988-
dc.date.accessioned	2019-05-23T14:36:41Z
dc.date.created	August 2018
dc.date.issued	2018-08
dc.date.submitted	August 2018
dc.date.updated	2019-05-23T14:36:41Z
dc.description.abstract	Each year, billions are lost in damages from phishing emails, and human researchers put countless hours researching new discovery techniques and finding the flaws in the old ones. The number of articles publishing these findings are increasing very rapidly, too rapidly for humans to assimilate and remember in a reasonable amount of time. This thesis adapts FeatureSmith's automatic feature extraction for Android malware detection for phishing email detection, to automatically extract all the features in each scholarly article, patent, and thesis. Because of the nature of a phishing email, which requires intelligent application of multiple features for accurate classification, the weighting and ranking utilized by FeatureSmith for Android to find the best features, was not as effective for phishing email. As a result the final, most helpful, features must then be manually extracted from the automatic explanations to use in phishing email detection. Sometimes the extraction process involves going to the source article, which can reveal tables, or other sources of overlooked features that can also be implemented. In total, 75 final features, both binary and discrete, were manually extracted. Implementing these features using Machine Learning, with intuition's aide, for phishing email classification, resulted in 94.6% detection accuracy, using an unbalanced dataset with separate training and testing emails obtained from the Anti Phishing Shared Pilot at International Workshop on Security and Privacy Analytics. The top reported testing accuracy for this dataset is 96.8% detection accuracy.
dc.description.department	Computer Science, Department of
dc.format.digitalOrigin	born digital
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/10657/4002
dc.language.iso	eng
dc.rights	The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subject	Phishing detection
dc.subject	Phishing
dc.subject	Automatic feature extraction
dc.title	Automatic Feature Extraction for Phishing Email Detection
dc.type.dcmi	Text
dc.type.genre	Thesis
local.embargo.lift	2020-08-01
local.embargo.terms	2020-08-01
thesis.degree.college	College of Natural Sciences and Mathematics
thesis.degree.department	Computer Science, Department of
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Houston
thesis.degree.level	Masters
thesis.degree.name	Master of Science

Files

Original bundle

Now showing 1 - 1 of 1

Name:: CRANE-THESIS-2018.pdf
Size:: 1.46 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 4.43 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 1.81 KB
Format:: Plain Text
Description:

Download

Collections

Published ETD Collection