Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Background. Malware has been a major issue for years and old signature scanning methods for detecting malware are outdated and can be bypassed by most advanced malware. With the help of machine learning, patterns of malware behavior and structure can be learned to detect the more advanced threats that are active today.
Objectives. In this thesis, research to find state-of-the-art machine learning methods to detect malware is proposed. A dataset collection method will be found in research to be used in an experiment. Three selected methods will be re-implemented for an experiment to compare which has the best performance. All three algorithms will be trained and tested on the same dataset.
Methods. A literature review with the snowballing technique was proposed to find the state-of-the-art detection methods. The malware was collected through the malware database VirusShare and the total number of samples was 14924. The algorithms were re-implemented, trained, tested, and compared by accuracy, true positive, true negative, false positive, and false negative.
Results. The results showed that the best performing research available are image detection, N-Gram combined with meta-data and Function Call Graphs. However, a new method was proposed called Running Window Entropy which does not have a lot of research about it and still can achieve decent accuracy. The selected methods for comparison were image detection, N-Gram, and Running Window Entropy where the results show they had an accuracy of 94.64%, 96.45%, and 93.71% respectively.
Conclusions. On this dataset, it showed that the N-Gram had the best performance of all three methods. The other two methods showed that, depending on the use case, either can be applicable.
2021.