Automatic Feature Extraction for Phishing Email Detection
dc.contributor.committeeMember | Verma, Rakesh M. | |
dc.contributor.committeeMember | Gnawali, Omprakash | |
dc.contributor.committeeMember | Conklin, Wm. Arthur | |
dc.creator | Crane, Devin R. 1988- | |
dc.date.accessioned | 2019-05-23T14:36:41Z | |
dc.date.created | August 2018 | |
dc.date.issued | 2018-08 | |
dc.date.submitted | August 2018 | |
dc.date.updated | 2019-05-23T14:36:41Z | |
dc.description.abstract | Each year, billions are lost in damages from phishing emails, and human researchers put countless hours researching new discovery techniques and finding the flaws in the old ones. The number of articles publishing these findings are increasing very rapidly, too rapidly for humans to assimilate and remember in a reasonable amount of time. This thesis adapts FeatureSmith's automatic feature extraction for Android malware detection for phishing email detection, to automatically extract all the features in each scholarly article, patent, and thesis. Because of the nature of a phishing email, which requires intelligent application of multiple features for accurate classification, the weighting and ranking utilized by FeatureSmith for Android to find the best features, was not as effective for phishing email. As a result the final, most helpful, features must then be manually extracted from the automatic explanations to use in phishing email detection. Sometimes the extraction process involves going to the source article, which can reveal tables, or other sources of overlooked features that can also be implemented. In total, 75 final features, both binary and discrete, were manually extracted. Implementing these features using Machine Learning, with intuition's aide, for phishing email classification, resulted in 94.6% detection accuracy, using an unbalanced dataset with separate training and testing emails obtained from the Anti Phishing Shared Pilot at International Workshop on Security and Privacy Analytics. The top reported testing accuracy for this dataset is 96.8% detection accuracy. | |
dc.description.department | Computer Science, Department of | |
dc.format.digitalOrigin | born digital | |
dc.format.mimetype | application/pdf | |
dc.identifier.uri | https://hdl.handle.net/10657/4002 | |
dc.language.iso | eng | |
dc.rights | The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s). | |
dc.subject | Phishing detection | |
dc.subject | Phishing | |
dc.subject | Automatic feature extraction | |
dc.title | Automatic Feature Extraction for Phishing Email Detection | |
dc.type.dcmi | Text | |
dc.type.genre | Thesis | |
local.embargo.lift | 2020-08-01 | |
local.embargo.terms | 2020-08-01 | |
thesis.degree.college | College of Natural Sciences and Mathematics | |
thesis.degree.department | Computer Science, Department of | |
thesis.degree.discipline | Computer Science | |
thesis.degree.grantor | University of Houston | |
thesis.degree.level | Masters | |
thesis.degree.name | Master of Science |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- CRANE-THESIS-2018.pdf
- Size:
- 1.46 MB
- Format:
- Adobe Portable Document Format