Using Host Behavior and Machine Learning to Detect Worms

This article is a literary review on a paper titled “Detection of unknown computer worms based on behavioral classification of the host” from 2008 by Robert Moskovitch et al. The goal of this article is to share a new method of malware detection utilizing machine learning.

Antivirus detection algorithms tend not to perform well when detecting newer malware. However, machine learning techniques have demonstrated their ability to generalize into new data and thus detect newer malware. In this paper, the researchers explore the usage of many algorithms, including neural networks, to monitor different aspects of the computer’s behavior and detect when the behavior indicates a worm infection. In the case of infection, further damage and spread can be prevented by detecting signs of computer behavior changes, which hold recognizable patterns.

In order to monitor the computer, up to 323 computer features were logged and observed for eight different computer environments with and without different computer worms. These features included messages received, thread count, and connection failures. This data was used to train different machine learning algorithms, including decision trees, Bayesian Networks, and Neural Networks. In essence, these models would use the data to learn patterns in host behavior which would indicate a worm infection, thus allowing one to use features in computer behavior to detect worms. The researchers also reduced the feature size to the top 20 most relevant, which kept the models at a high mean accuracy of 90% for detecting worms The models varied in accuracy, with Bayesian Networks in general performing the best but all above 80% accuracy. An important result of the experiments is that the models showed promise in generalizing to new worms, as the test set exclusively contained worms excluded in the training set. This suggests that a larger model trained on more data would perform even better and become an effective method of detecting worm infection. Another consideration is that some model types performed better for some worm types, which suggests that using multiple models together would greatly improve accuracy and worm detection of various types. The researcher’s stated that increasing the dataset, such as quantity of worms, shows promise in improving the models, and the limited quantity of worms in their data was practically the only limitation of their study.

Overall, machine learning models show great promise in detection of infections due to their ability to generalize to unforeseen data. In practice, multiple models can be used to corroborate or proof check each other to maximize accuracy. The main downside to this approach is that this method isn’t preventative and is mainly to prevent further damage done to computer networks. However, it does exemplify a clear indication that machine learning has great potential in malware detection.

Source: https://www.sciencedirect.com/science/article/pii/S0167947308000315#sec6