Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Android malware detection using feature fusion and artificial data
Mittuniversitetet.
2018 (English)In: 16th IEEE International Conference on Dependable, Autonomic and Secure Computing, IEEE 16th International Conference on Pervasive Intelligence and Computing, IEEE 4th International Conference on Big Data Intelligence and Computing and IEEE 3rd Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 702-709Conference paper, Published paper (Refereed)
Abstract [en]

For the Android malware detection / classification anti-malware community has relied on traditional malware detection methods as a countermeasure. However, traditional detection methods are developed for detecting the computer malware, which is different from Android malware in structure and characteristics. Thus, they may not be useful for Android malware detection. Moreover, majority of suggested detection approaches may not be generalized and are incapable of detecting zero-day malware due to different reasons such as available data set with specific set of examples. Thus, their detection accuracy may be questionable. To address this problem, this paper presents a malware classification approach with a reliable detection accuracy and evaluate the approach using artificially generated examples. The suggested approach generates the signature profiles and behavior profiles of each application in the data set, which are further used as input for the classification task. For improving the detection accuracy, feature fusion of features from filter methods and wrapper method and algorithm fusion is investigated. Without affecting the detection accuracy, the optimal balance between real world examples and synthetic examples is also investigated. The experimental results suggest that both AUC and F1 can be obtained up to 0.94 for both known and unknown malware using original examples and synthetic examples. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018. p. 702-709
Keywords [en]
Android (operating system), Big data, Classification (of information), Computer crime, Feature extraction, Classification tasks, Computer malware, Detection accuracy, Detection approach, Detection methods, Malware classifications, Malware detection, Reliable detection
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-25882DOI: 10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00123Scopus ID: 2-s2.0-85056882366ISBN: 9781538675182 (print)OAI: oai:DiVA.org:bth-25882DiVA, id: diva2:1825633
Conference
16th IEEE International Conference on Dependable, Autonomic and Secure Computing, IEEE 16th International Conference on Pervasive Intelligence and Computing, IEEE 4th International Conference on Big Data Intelligence and Computing and IEEE 3rd Cyber Science and Technology Congress, DASC-PICom-DataCom-CyberSciTec, Athens 12-15 August 2018
Available from: 2024-01-09 Created: 2024-01-09 Last updated: 2024-01-09Bibliographically approved
In thesis
1. Automated Malware Detection and Classification Using Supervised Learning
Open this publication in new window or tab >>Automated Malware Detection and Classification Using Supervised Learning
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Malware has been one of the key concerns for Information Technology security researchers for decades. Every year, anti-malware companies release alarming statistics suggesting a continuous increase in the number and types of malware.  This is mainly due to the constant development of new and more sophisticated malicious functionalities, propagation vectors, and infection tactics for malware. To combat this ever-evolving threat, anti-malware companies analyze thousands of malicious samples on a daily basis, either manually or through semi-automated means, to identify their type (whether it's a variant or zero-day) and family. After the analysis, signature databases or rule databases of anti-malware products are updated in order to detect known malware.  However, due to the ever-growing capabilities of malware, the malware analysis process is challenging and requires significant human effort. As a result, researchers are focusing on data-driven approaches based on machine learning to develop intelligent malware detectors with high accuracy. Specifically, they are focused on extracting static features from malware in the form of n-grams for experimental purposes. However, the previous research is inconclusive in terms of optimal feature representation and detection accuracy.

The primary objective of this thesis is to present state-of-the-art automated techniques for detecting and classifying malware using supervised learning algorithms. In particular, the focus is on two critical aspects of supervised learning-based malware detection: optimal feature representation and improved detection accuracy. Malware detection can be accomplished using two methods: static analysis, which extracts patterns without executing malware, and dynamic analysis, which captures behaviors through executing malware. This thesis focuses on static analysis instead of dynamic analysis because static analysis requires fewer computing resources. An additional benefit of static analysis is that present-day malware cannot evade it. To achieve the goals of this thesis, two new feature representations for static analysis are proposed. Furthermore, three customized ensembles are introduced to enhance malware detection accuracy, and their feasibility is experimentally demonstrated.  

The experiments incorporate customized malware data sets including Spyware, Adware, Scareware, and Android malware samples, and public malware data sets from Microsoft's having samples from nine distinct malware families. Artificially generated data sets are employed to mitigate class imbalance issues and represent inter-family and intra-family examples. Reverse engineering is performed to transform the data sets as feature data sets using both byte code and assembly language instructions. Further, existing and new feature representations along with various feature selection algorithms and feature fusion techniques are explored. To enhance detection accuracy, different decision theories from social choice theory, such as veto and consensus, are integrated into customized ensembles. The experimental results indicate that the proposed methods are capable of detecting known and zero-day malware. The proposed ensembles are also tested on the UCI public data sets, such as Forest CoverType, and the results demonstrate their effectiveness in classification. Further, these methods are designed to be portable and adaptable to different operating systems, and they can also be scaled for multi-class malware detection.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2024
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 3
Keywords
Malware Detection, Android Malware, Machine Learning, Static Malware Analysis, Cyber Security, Ensemble learning, Supervised Learning, Feature Selection
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:bth-25793 (URN)978-91-7295-475-5 (ISBN)
Public defence
2024-01-31, J1630, Campus Karlskrona, 13:00 (English)
Opponent
Supervisors
Available from: 2024-01-09 Created: 2024-01-09 Last updated: 2024-01-11Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Shahzad, Raja Khurram

Search in DiVA

By author/editor
Shahzad, Raja Khurram
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 71 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf