Automatic Feature Extraction for Phishing Email Detection

dc.contributor.committeeMemberVerma, Rakesh M.
dc.contributor.committeeMemberGnawali, Omprakash
dc.contributor.committeeMemberConklin, Wm. Arthur
dc.creatorCrane, Devin R. 1988-
dc.date.accessioned2019-05-23T14:36:41Z
dc.date.createdAugust 2018
dc.date.issued2018-08
dc.date.submittedAugust 2018
dc.date.updated2019-05-23T14:36:41Z
dc.description.abstractEach year, billions are lost in damages from phishing emails, and human researchers put countless hours researching new discovery techniques and finding the flaws in the old ones. The number of articles publishing these findings are increasing very rapidly, too rapidly for humans to assimilate and remember in a reasonable amount of time. This thesis adapts FeatureSmith's automatic feature extraction for Android malware detection for phishing email detection, to automatically extract all the features in each scholarly article, patent, and thesis. Because of the nature of a phishing email, which requires intelligent application of multiple features for accurate classification, the weighting and ranking utilized by FeatureSmith for Android to find the best features, was not as effective for phishing email. As a result the final, most helpful, features must then be manually extracted from the automatic explanations to use in phishing email detection. Sometimes the extraction process involves going to the source article, which can reveal tables, or other sources of overlooked features that can also be implemented. In total, 75 final features, both binary and discrete, were manually extracted. Implementing these features using Machine Learning, with intuition's aide, for phishing email classification, resulted in 94.6% detection accuracy, using an unbalanced dataset with separate training and testing emails obtained from the Anti Phishing Shared Pilot at International Workshop on Security and Privacy Analytics. The top reported testing accuracy for this dataset is 96.8% detection accuracy.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10657/4002
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectPhishing detection
dc.subjectPhishing
dc.subjectAutomatic feature extraction
dc.titleAutomatic Feature Extraction for Phishing Email Detection
dc.type.dcmiText
dc.type.genreThesis
local.embargo.lift2020-08-01
local.embargo.terms2020-08-01
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CRANE-THESIS-2018.pdf
Size:
1.46 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.43 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: