Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Correcting and complementing freeway traffic accident data using mahalanobis distance based outlier detection
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för kreativa teknologier.ORCID-id: 0000-0001-5824-425X
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för kreativa teknologier.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för kreativa teknologier.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för kreativa teknologier.ORCID-id: 0000-0003-0891-2859
2017 (Engelska)Ingår i: Technical Gazette, ISSN 1330-3651, E-ISSN 1848-6339, Vol. 24, nr 5, s. 1597-1607Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

A huge amount of traffic data is archived which can be used in data mining especially supervised learning. However, it is not being fully used due to lack of accurate accident information (labels). In this study, we improve a Mahalanobis distance based algorithm to be able to handle differential data to estimate flow fluctuations and detect accidents and use it to support correcting and complementing accident information. The outlier detection algorithm provides accurate suggestions for accident occurring time, duration and direction. We also develop a system with interactive user interface to realize this procedure. There are three contributions for data handling. Firstly, we propose to use multi-metric traffic data instead of single metric for traffic outlier detection. Secondly, we present a practical method to organise traffic data and to evaluate the organisation for Mahalanobis distance. Thirdly, we describe a general method to modify Mahalanobis distance algorithms to be updatable. © 2017, Strojarski Facultet. All rights reserved.

Ort, förlag, år, upplaga, sidor
Strojarski Facultet , 2017. Vol. 24, nr 5, s. 1597-1607
Nyckelord [en]
Accident data, Data labelling, Differential distance, Mahalanobis distance, Outlier detection, Traffic data, Updatable algorithm, Accidents, Data mining, Statistics, User interfaces, Mahalanobis distances, Data handling
Nationell ämneskategori
Kommunikationssystem Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:bth-15472DOI: 10.17559/TV-20150616163905ISI: 000417100300037Scopus ID: 2-s2.0-85032512786OAI: oai:DiVA.org:bth-15472DiVA, id: diva2:1156201
Anmärkning

Funded by National Natural Science Foundation of China

Funding nr. 61364019

Tillgänglig från: 2017-11-10 Skapad: 2017-11-10 Senast uppdaterad: 2018-11-01Bibliografiskt granskad
Ingår i avhandling
1. Automated Traffic Time Series Prediction
Öppna denna publikation i ny flik eller fönster >>Automated Traffic Time Series Prediction
2018 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Intelligent transportation systems (ITS) are becoming more and more effective. Robust and accurate short-term traffic prediction plays a key role in modern ITS and demands continuous improvement. Benefiting from better data collection and storage strategies, a huge amount of traffic data is archived which can be used for this purpose especially by using machine learning.

For the data preprocessing stage, despite the amount of data available, missing data records and their messy labels are two problems that prevent many prediction algorithms in ITS from working effectively and smoothly. For the prediction stage, though there are many prediction algorithms, higher accuracy and more automated procedures are needed.

Considering both preprocessing and prediction studies, one widely used algorithm is k-nearest neighbours (kNN) which has shown high accuracy and efficiency. However, the general kNN is designed for matrix instead of time series which lacks the use of time series characteristics. Choosing the right parameter values for kNN is problematic due to dynamic traffic characteristics. This thesis analyses kNN based algorithms and improves the prediction accuracy with better parameter handling using time series characteristics.

Specifically, for the data preprocessing stage, this work introduces gap-sensitive windowed kNN (GSW-kNN) imputation. Besides, a Mahalanobis distance-based algorithm is improved to support correcting and complementing label information. Later, several automated and dynamic procedures are proposed and different strategies for making use of data and parameters are also compared.

Two real-world datasets are used to conduct experiments in different papers. The results show that GSW-kNN imputation is 34% on average more accurate than benchmarking methods, and it is still robust even if the missing ratio increases to 90%. The Mahalanobis distance-based models efficiently correct and complement label information which is then used to fairly compare performance of algorithms. The proposed dynamic procedure (DP) performs better than manually adjusted kNN and other benchmarking methods in terms of accuracy on average. What is better, weighted parameter tuples (WPT) gives more accurate results than any human tuned parameters which cannot be achieved manually in practice. The experiments indicate that the relations among parameters are compound and the flow-aware strategy performs better than the time-aware one. Thus, it is suggested to consider all parameter strategies simultaneously as ensemble strategies especially by including window in flow-aware strategies.

In summary, this thesis improves the accuracy and automation level of short-term traffic prediction with proposed high-speed algorithms.

Ort, förlag, år, upplaga, sidor
Karlskrona: Blekinge Tekniska Högskola, 2018
Serie
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 10
Nyckelord
Machine Learning, Time Series, Traffic Engineering
Nationell ämneskategori
Datavetenskap (datalogi) Transportteknik och logistik
Identifikatorer
urn:nbn:se:bth-17210 (URN)978-91-7295-360-4 (ISBN)
Disputation
2018-11-30, J1650, Valhallav. 1, Karlskrona, 13:30 (Engelska)
Opponent
Handledare
Tillgänglig från: 2018-11-02 Skapad: 2018-11-01 Senast uppdaterad: 2018-12-14Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Sun, BinCheng, WeiBai, GuohuaGoswami, Prashant

Sök vidare i DiVA

Av författaren/redaktören
Sun, BinCheng, WeiBai, GuohuaGoswami, Prashant
Av organisationen
Institutionen för kreativa teknologier
I samma tidskrift
Technical Gazette
KommunikationssystemData- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 382 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf