Attribute Selection Using Machine Learning Algorithms for Intrusion Detection



Journal Title

Journal ISSN

Volume Title



As technology progresses, more critical information is being stored on computers. This allows the data to be stored more compactly, and sometimes adds benefits such as searchability. At the same time, these storage spaces are often connected through a network to the outside world. This increases the chance of attack, as clever thieves can now access the data remotely. Authentication methods can be used to try and block attackers, but hackers can get around them by gaining access to user credentials. Then they can steal data under the guise of a normal user. To combat this, I look at how attacker activity differs from normal user activity. I used two categories from the Windows-Users and -Intruder simulations Logs dataset, one of time and the other of file pathway. From these categories, I calculated ten attributes. Each attribute has an intuitive expectation of highlighting differences between attack and normal behavior. Then various methods were used to analyze the attributes. First, I looked at how a simple threshold on each attribute would perform individually. Correlation between the attributes was considered to avoid information overlap. Then I used a white-box machine learning method. First it was trained with all the attributes, and the second time it was trained with only the attributes that performed well with a simple threshold. Next, I used a black-box machine learning method. This method was trained once with all attributes, once with the best individual attributes, and once with the attributes selected by the white-box method. I found that only a couple of the original attributes did well on their own. However, most of the attributes selected by the white-box method were poor individual performers. The black-box method did best with the attributes selected by the white-box method, and worst with the best individual performers. The white-box method strongly outperformed the black-box method on attack data, but did slightly worse at correctly identifying normal data.



Intrusion detection, Machine learning, WUIL, Data theft, Recursive partitioning, Decision trees, Deep neural networks