A Multi-Pronged Approach to Phishing Email Detection



Journal Title

Journal ISSN

Volume Title



Phishing emails are a nuisance and a growing threat for the world causing loss of time, effort and money. In this era of online communication and electronic data exchange, every individual connected to the Internet has to face the danger of phishing attacks. Typically, benign-looking emails are used as the attack vectors, which trick users into revealing sensitive information like login credentials, credit-card details, etc. Since every email contains important information in its header, this thesis describes ways of capturing this information for successful classification of phishing emails. Moreover, the phisher has total control over the email body and subject, but little control over the header after the email leaves the sender's domain, unless the phisher is sophisticated and spends a lot of time crafting the attack, which reduces the payoff or may even backfire or yield mixed results.

This thesis is a consolidated account of various systems designed to combat phishing emails from different dimensions. The main area of focus is email header. Techniques like n-gram analysis, machine learning and network port scanning are used to extract useful features from the emails. This thesis shows that the classes of features used in these systems are very effective in distinguishing the phishing emails from the legitimate ones. Using different real datasets from varied domains, it highlights the robustness of the methods presented. Some methods, like the header-domain analysis, obtain high detection rates of 99.9% and low false positive rates of 0.1%. These approaches have the advantage and flexibility that they can be easily combined with other existing methods, in addition to being used in standalone mode.



Phishing, Email, Message-ID, Header-Domains


Portions of this document appear in: Verma, Rakesh, and Nirmala Rai. "Phish-idetector: Message-id based automatic phishing detection." In 2015 12th International Joint Conference on e-Business and Telecommunications (ICETE), vol. 4, pp. 427-434. IEEE, 2015.