ENHANCING MALWARE DETECTION THROUGH BEHAVIORAL MODELLING AND FEATURE LEARNING
The growing frequency of data breaches and cyberattacks due to malware infections in recent years highlights the significance of ongoing research in malware detection. Malicious software, or malware for short, often undergoes numerous mutations to avoid detection by signature-based antivirus software. The abundance of malware variants has made the task of detection increasingly complex. Mainstream cybersecurity vendors favor static analysis methods due to their speed and scalability in assessing incoming files, generating their signatures, and cross-referencing them with a database of recognized malicious signatures for detection. However, this form of analysis is susceptible to obfuscation methods where hackers modify malware code in superfluous ways to generate a new signature that is not yet recognized by antiviruses. That is why this work focuses on analyzing the run-time execution of programs to extract their behavior and identify them as malware or benign. This dissertation addresses the persistent challenge posed by the ever-evolving malware variants by introducing a framework designed to capture the run-time behavior of programs through graph modelling and deep learning methods. The proposed approach parses the log of native functions called by a program during its execution. This parsing process enables the creation of Behavior Call Graphs (BCGs) using a novel methodology emphasizing the connections between these native functions. Graph structures offer the ability to effectively represent intricate relationships within the data, facilitating the extraction of relevant information that might be challenging to capture otherwise. This research employs two different methods to analyze these BCGs. The first involves extracting domain expert features, while the second leverages deep learning algorithms to generate the features automatically. However, it's worth noting that conventional deep learning methods like Neural Networks and Convolutional Neural Networks are not designed to handle graphs as input. To address this limitation, we adopted feature learning algorithms that automatically embed graph structures into feature vectors within a multi-dimensional space. This dissertation validates the effectiveness of these approaches in analyzing BCG generated from Windows and Android applications to identify and capture the malicious behavior of malware variants. This research is helpful for companies and software publishers to test the safety of uploaded or shared applications and prevent malware from spreading to their end users.