Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Detection of Spyware by Mining Executable Files
Blekinge Institute of Technology, School of Computing.
Blekinge Institute of Technology, School of Computing.
2009 (English)Independent thesis Advanced level (degree of Master (Two Years))Student thesis
Abstract [en]

Malicious programs have been a serious threat for the confidentiality, integrity and availability of a system. Different researches have been done to detect them. Two approaches have been derived for it i.e. Signature Based Detection and Heuristic Based Detection. These approaches performed well against known malicious programs but cannot catch the new malicious programs. Different researchers tried to find new ways of detecting malicious programs. The application of data mining and machine learning is one of them and has shown good results compared to other approaches. A new category of malicious programs has gained momentum and it is called Spyware. Spyware are more dangerous for confidentiality of private data of the user of system. They may collect the data and send it to third party. Traditional techniques have not performed well in detecting Spyware. So there is a need to find new ways for the detection of Spyware. Data mining and machine learning have shown promising results in the detection of other malicious programs but it has not been used for detection of Spyware yet. We decided to employ data mining for the detection of spyware. We used a data set of 137 files which contains 119 benign files and 18 Spyware files. A theoretical taxonomy of Spyware is created but for the experiment only two classes, Benign and Spyware, are used. An application Binary Feature Extractor have been developed which extract features, called n-grams, of different sizes on the basis of common feature-based and frequency-based approaches. The number of features were reduced and used to create an ARFF file. The ARFF file is used as input to WEKA for applying machine learning algorithms. The algorithms used in the experiment are: J48, Random Forest, JRip, SMO, and Naive Bayes. 10-fold cross-validation and the area under ROC curve is used for the evaluation of classifier performance. We performed experiments on three different n-gram sizes, i.e.: 4, 5, 6. Results have shown that extraction of common feature approach has produced better results than others. We achieved an overall accuracy of 90.5 % with an n-gram size of 6 from the J48 classifier. The maximum area under ROC achieved was 83.3 % with Random Forest.

Place, publisher, year, edition, pages
2009. , 52 p.
Keyword [en]
Spyware Detection, Data Mining, Machine Learning, Feature Extraction, WEKA, ARFF
National Category
Computer Science Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:bth-3095Local ID: oai:bth.se:arkivex18D18DDACF0ED590C12575D80035524AOAI: oai:DiVA.org:bth-3095DiVA: diva2:830393
Uppsok
Physics, Chemistry, Mathematics
Supervisors
Note
+46709325761, +46762782550Available from: 2015-04-22 Created: 2009-06-17 Last updated: 2015-06-30Bibliographically approved

Open Access in DiVA

fulltext(387 kB)322 downloads
File information
File name FULLTEXT01.pdfFile size 387 kBChecksum SHA-512
6fa08199b679460c1eaae379cf9367adf9f21691eefb0cd52653fa99b2d1d5f27caf6f741ee48e7e79dceab364f22c630364c5fe03c943e28151c3feacf6031f
Type fulltextMimetype application/pdf

By organisation
School of Computing
Computer ScienceProbability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 322 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 60 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf