Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Data Modeling for Outlier Detection
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.
2018 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

This thesis explores the data modeling for outlier detection techniques in three different application domains: maritime surveillance, district heating, and online media and sequence datasets. The proposed models are evaluated and validated under different experimental scenarios, taking into account specific characteristics and setups of the different domains.

Outlier detection has been studied and applied in many domains. Outliers arise due to different reasons such as fraudulent activities, structural defects, health problems, and mechanical issues. The detection of outliers is a challenging task that can reveal system faults, fraud, and save people's lives. Outlier detection techniques are often domain-specific. The main challenge in outlier detection relates to modeling the normal behavior in order to identify abnormalities. The choice of model is important, i.e., an incorrect choice of data model can lead to poor results. This requires a good understanding and interpretation of the data, the constraints, and the requirements of the problem domain. Outlier detection is largely an unsupervised problem due to unavailability of labeled data and the fact that labeled data is expensive.

We have studied and applied a combination of both machine learning and data mining techniques to build data-driven and domain-oriented outlier detection models. We have shown the importance of data preprocessing as well as feature selection in building suitable methods for data modeling. We have taken advantage of both supervised and unsupervised techniques to create hybrid methods. For example, we have proposed a rule-based outlier detection system based on open data for the maritime surveillance domain. Furthermore, we have combined cluster analysis and regression to identify manual changes in the heating systems at the building level. Sequential pattern mining for identifying contextual and collective outliers in online media data have also been exploited. In addition, we have proposed a minimum spanning tree clustering technique for detection of groups of outliers in online media and sequence data. The proposed models have been shown to be capable of explaining the underlying properties of the detected outliers. This can facilitate domain experts in narrowing down the scope of analysis and understanding the reasons of such anomalous behaviors. We have also investigated the reproducibility of the proposed models in similar application domains.

sted, utgiver, år, opplag, sider
Karlskrona: Blekinge Tekniska Högskola, 2018.
Serie
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 4
Emneord [en]
data modeling, cluster analysis, stream data, outlier detection
HSV kategori
Identifikatorer
URN: urn:nbn:se:bth-16580ISBN: 978-91-7295-358-1 (tryckt)OAI: oai:DiVA.org:bth-16580DiVA, id: diva2:1255525
Presentation
2018-11-09, Blekinge Tekniska Högskola, Karlskrona, 10:00 (engelsk)
Opponent
Veileder
Prosjekter
Scalable resource-efficient systems for big data analytics
Forskningsfinansiär
Knowledge Foundation, 20140032Tilgjengelig fra: 2018-10-25 Laget: 2018-10-12 Sist oppdatert: 2018-12-04bibliografisk kontrollert
Delarbeid
1. Open Data for Anomaly Detection in Maritime Surveillance
Åpne denne publikasjonen i ny fane eller vindu >>Open Data for Anomaly Detection in Maritime Surveillance
Vise andre…
2013 (engelsk)Inngår i: Expert Systems with Applications, ISSN 0957-4174, Vol. 40, nr 14, s. 5719-5729Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Maritime Surveillance has received increased attention from a civilian perspective in recent years. Anomaly detection is one of many techniques available for improving the safety and security in this domain. Maritime authorities use confidential data sources for monitoring the maritime activities; however, a paradigm shift on the Internet has created new open sources of data. We investigate the potential of using open data as a complementary resource for anomaly detection in maritime surveillance. We present and evaluate a decision support system based on open data and expert rules for this purpose. We conduct a case study in which experts from the Swedish coastguard participate to conduct a real-world validation of the system. We conclude that the exploitation of open data as a complementary resource is feasible since our results indicate improvements in the efficiency and effectiveness of the existing surveillance systems by increasing the accuracy and covering unseen aspects of maritime activities.

sted, utgiver, år, opplag, sider
Elsevier, 2013
Emneord
Open data, Anomaly detection, Maritime security, Maritime domain awareness
HSV kategori
Identifikatorer
urn:nbn:se:bth-6807 (URN)10.1016/j.eswa.2013.04.029 (DOI)000321089200029 ()oai:bth.se:forskinfoD455168E88392FDDC1257B6200290B99 (Lokal ID)oai:bth.se:forskinfoD455168E88392FDDC1257B6200290B99 (Arkivnummer)oai:bth.se:forskinfoD455168E88392FDDC1257B6200290B99 (OAI)
Tilgjengelig fra: 2013-12-17 Laget: 2013-05-05 Sist oppdatert: 2018-10-12bibliografisk kontrollert
2. Trend analysis to automatically identify heat program changes
Åpne denne publikasjonen i ny fane eller vindu >>Trend analysis to automatically identify heat program changes
Vise andre…
2017 (engelsk)Inngår i: Energy Procedia, Elsevier, 2017, Vol. 116, s. 407-415Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The aim of this study is to improve the monitoring and controlling of heating systems located at customer buildings through the use of a decision support system. To achieve this, the proposed system applies a two-step classifier to detect manual changes of the temperature of the heating system. We apply data from the Swedish company NODA, active in energy optimization and services for energy efficiency, to train and test the suggested system. The decision support system is evaluated through an experiment and the results are validated by experts at NODA. The results show that the decision support system can detect changes within three days after their occurrence and only by considering daily average measurements.

sted, utgiver, år, opplag, sider
Elsevier, 2017
Serie
Energy Procedia, ISSN 1876-6102 ; 116
Emneord
District heating, Trend analysis, Change detection, Smart automated system
HSV kategori
Identifikatorer
urn:nbn:se:bth-12894 (URN)10.1016/j.egypro.2017.05.088 (DOI)000406743000039 ()
Konferanse
15th International Symposium on District Heating and Cooling (DHC2016), Seoul
Prosjekter
BigData@BTH
Forskningsfinansiär
Knowledge Foundation, 20140032
Merknad

Open access

Tilgjengelig fra: 2016-09-26 Laget: 2016-07-13 Sist oppdatert: 2018-10-12bibliografisk kontrollert
3. Outlier Detection for Video Session Data Using Sequential Pattern Mining
Åpne denne publikasjonen i ny fane eller vindu >>Outlier Detection for Video Session Data Using Sequential Pattern Mining
Vise andre…
2018 (engelsk)Inngår i: ACM SIGKDD Workshop On Outlier Detection De-constructed, 2018Konferansepaper, Oral presentation only (Fagfellevurdert)
Abstract [en]

The growth of Internet video and over-the-top transmission techniqueshas enabled online video service providers to deliver highquality video content to viewers. To maintain and improve thequality of experience, video providers need to detect unexpectedissues that can highly affect the viewers’ experience. This requiresanalyzing massive amounts of video session data in order to findunexpected sequences of events. In this paper we combine sequentialpattern mining and clustering to discover such event sequences.The proposed approach applies sequential pattern mining to findfrequent patterns by considering contextual and collective outliers.In order to distinguish between the normal and abnormal behaviorof the system, we initially identify the most frequent patterns. Thena clustering algorithm is applied on the most frequent patterns.The generated clustering model together with Silhouette Index areused for further analysis of less frequent patterns and detectionof potential outliers. Our results show that the proposed approachcan detect outliers at the system level.

Emneord
Cluster Analysis, Data Stream Mining, Outlier Detection, Sequential Pattern Mining
HSV kategori
Identifikatorer
urn:nbn:se:bth-16944 (URN)
Konferanse
ACM SIGKDD Workshop On Outlier Detection De-constructed, London,
Forskningsfinansiär
Knowledge Foundation, 20140032
Tilgjengelig fra: 2018-10-01 Laget: 2018-10-01 Sist oppdatert: 2018-10-12bibliografisk kontrollert
4. A Minimum Spanning Tree Clustering Approach for Outlier Detection in Event Sequences
Åpne denne publikasjonen i ny fane eller vindu >>A Minimum Spanning Tree Clustering Approach for Outlier Detection in Event Sequences
Vise andre…
2018 (engelsk)Inngår i: 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) / [ed] Wani M.A.,Sayed-Mouchaweh M.,Lughofer E.,Gama J.,Kantardzic M., IEEE, 2018, s. 1123-1130, artikkel-id 8614207Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Outlier detection has been studied in many domains. Outliers arise due to different reasons such as mechanical issues, fraudulent behavior, and human error. In this paper, we propose an unsupervised approach for outlier detection in a sequence dataset. The proposed approach combines sequential pattern mining, cluster analysis, and a minimum spanning tree algorithm in order to identify clusters of outliers. Initially, the sequential pattern mining is used to extract frequent sequential patterns. Next, the extracted patterns are clustered into groups of similar patterns. Finally, the minimum spanning tree algorithm is used to find groups of outliers. The proposed approach has been evaluated on two different real datasets, i.e., smart meter data and video session data. The obtained results have shown that our approach can be applied to narrow down the space of events to a set of potential outliers and facilitate domain experts in further analysis and identification of system level issues.

sted, utgiver, år, opplag, sider
IEEE, 2018
Emneord
Clustering, Minimum spanning tree, Outlier detection, Sequential pattern mining
HSV kategori
Identifikatorer
urn:nbn:se:bth-17100 (URN)10.1109/ICMLA.2018.00182 (DOI)000463034400174 ()9781538668047 (ISBN)
Konferanse
17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018; Orlando; United States; 17 December 2018 through 20 December
Forskningsfinansiär
Knowledge Foundation, 20140032
Tilgjengelig fra: 2018-10-09 Laget: 2018-10-09 Sist oppdatert: 2019-06-28bibliografisk kontrollert

Open Access i DiVA

fulltext(13534 kB)234 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 13534 kBChecksum SHA-512
d6e62ed1c729b9d1e4f5afe89f060d12ee8b5c3bde4f1463854b83106601343155a92dfbe2583084477b9c0ca5a425528672ba491d408522c2a5779dd052d3b1
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Abghari, Shahrooz
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 234 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 621 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf