Evaluation of Features and Clustering Algorithms for Malware



Journal Title

Journal ISSN

Volume Title



Malware undoubtedly have become a major threat in modern society and their numbers are growing daily. The internet today is increasingly used by highly skilled malware developers and has even become home to large black markets for the purchase and spreading of malware. This provides a strong incentive for the malware developers to decrease the chances of being detected by anti-virus programs. By using different obfuscation techniques, authors can ensure other versions of their malware continue to function if a signature is developed for another. This leads to multiple new implementations of the same type of malicious software that can propagate out of control. Approximately, about 400,000 new malware are being registered every day which gives rise to the problem of processing the huge amount of unstructured data obtained from malware analysis. This also makes it challenging for anti-virus vendors to detect zero-day attacks and release updates in a reasonable time-frame to prevent infection and propagation. Hence, to ensure that a large number of malware is analyzed and understood, a possible technique is to cluster them into groups of malware that have similar characteristics. These groupings can help in visualizing relationships between malware and their evolution over time, construct automatic signatures for entire groupings of malware instead of individually, and even help in the detection of zero-day malware. By extracting data via dynamic analysis, we test several combinations of features to generate clusters using multiple different mechanisms combined with a host of different similarity measures and analyze the results.