Data Mining Approaches for Outlier Detection Analysis
2020 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]
Outlier detection is studied and applied in many domains. Outliers arise due to different reasons such as fraudulent activities, structural defects, health problems, and mechanical issues. The detection of outliers is a challenging task that can reveal system faults, fraud, and save people's lives. Outlier detection techniques are often domain-specific. The main challenge in outlier detection relates to modelling the normal behaviour in order to identify abnormalities. The choice of model is important, i.e., an unsuitable data model can lead to poor results. This requires a good understanding and interpretation of the data, the constraints, and requirements of the domain problem. Outlier detection is largely an unsupervised problem due to unavailability of labeled data and the fact that labeled data is expensive.
In this thesis, we study and apply a combination of both machine learning and data mining techniques to build data-driven and domain-oriented outlier detection models. We focus on three real-world application domains: maritime surveillance, district heating, and online media and sequence datasets. We show the importance of data preprocessing as well as feature selection in building suitable methods for data modelling. We take advantage of both supervised and unsupervised techniques to create hybrid methods.
More specifically, we propose a rule-based anomaly detection system using open data for the maritime surveillance domain. We exploit sequential pattern mining for identifying contextual and collective outliers in online media data. We propose a minimum spanning tree clustering technique for detection of groups of outliers in online media and sequence data. We develop a few higher order mining approaches for identifying manual changes and deviating behaviours in the heating systems at the building level. The proposed approaches are shown to be capable of explaining the underlying properties of the detected outliers. This can facilitate domain experts in narrowing down the scope of analysis and understanding the reasons of such anomalous behaviours. We also investigate the reproducibility of the proposed models in similar application domains.
sted, utgiver, år, opplag, sider
Karlskrona: Blekinge Tekniska Högskola, 2020. , s. 251
Serie
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 9
Emneord [en]
outlier detection, data modelling, machine learning, clustering analysis, data stream mining
HSV kategori
Forskningsprogram
Datavetenskap
Identifikatorer
URN: urn:nbn:se:bth-20454ISBN: 9789172954090 (tryckt)OAI: oai:DiVA.org:bth-20454DiVA, id: diva2:1474986
Disputas
2020-12-01, J1630, Karlskrona, 13:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
Knowledge Foundation, 201400322020-10-162020-10-122025-09-30bibliografisk kontrollert
Delarbeid