Phishing Site Detection from a Web Developer's Perspective
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Internet has enabled unprecedented communication and new technologies. Concomitantly, it has brought the bane of phishing and exacerbated vulnerabilities. In this thesis, we propose a model to detect phishing webpages from a web developer's perspective. From this standpoint, we design 120 novel features based on content from a webpage, four time-based and two search-based novel features, plus we use 34 other content-based and 11 heuristic features to optimize the model. Moreover, we select Random Committee (Base learner: Random Tree) for our framework since it has the best performance after comparing with six other algorithms: Hellinger Distance Decision Tree, Support Vector Machine (SVM), Logistic Regression, J48, Naive Bayes, and Random Forest. In real-time experiments, the model achieved 99.4% precision and 98.3% Matthews correlation coefficient (MCC) with 0.1% false positive rate in 5-fold crossvalidation using the realistic scenario of an unbalanced dataset.