Short-term traffic prediction on freeways has been an active research subject in the past several decades. Various algorithms covering a broad range of topics regarding performance, data requirements and efficiency have been proposed. However, the implementation of machine learning based algorithms in traffic management centres is still limited. Two main reasons for this situation are, the data is messy or missing, and the parameter tuning requires experienced engineers.
The main objective of this thesis was to develop a procedure that can improve the performance and automation level of short-term traffic prediction.
Missing data is a problem that prevents many prediction algorithms in ITS from working effectively. Much work has been done to impute those missing data. Among different imputation methods, k-nearest neighbours (kNN) has shown excellent accuracy and efficiency. However, the general kNN is designed for matrix instead of time series so it lacks the usage of time series characteristics such as windows and weights that are gap-sensitive. We introduce gap-sensitive windowed kNN (GSW-kNN) imputation for time series. The results show that GSW-kNN is 34% more accurate than benchmarking methods, and it is still robust even if the missing ratio increases to 90%.
Lacking accurate accident information (labels) is another problem that prevents huge amount of traffic data to be fully used. We improve a Mahalanobis distance based algorithm to be able to handle differential data to estimate flow fluctuations and detect accidents and use it to support correcting and complementing accident information. The outlier detection algorithm provides accurate suggestions for accident occurring time, duration and direction. We also develop a system with interactive user interface to realize this procedure. There are three contributions for data handling. Firstly, we propose to use multi-metric traffic data instead of single metric for traffic outlier detection. Secondly, we present a practical method to organise traffic data and to evaluate the organisation for Mahalanobis distance. Thirdly, we describe a general method to modify Mahalanobis distance algorithms to be updatable.
For automatic parameter tuning, the experiments show that the flow-aware strategy performs better than the time-aware one. Thus, we use all parameter strategies simultaneously as ensemble strategies especially by including window in flow-aware strategies.
Based on the above studies, we have developed online-orientated and offline-orientated algorithms for real-time traffic forecasting. The online automatic tuned version is performing near the optimal manual tuned performance. The offline version gives the performance that cannot be achieved using the manual tuning. It is also 3.05% better than XGB and 11.7% better than traditional SARIMA.
2017. Vol. 12, article id 12