Change search
Link to record
Permanent link

Direct link
Publications (10 of 10) Show all publications
Zhang, L., Sun, B., Geng, R., Li, S. & Shen, T. (2022). Boosting Machine Learning Algorithms with Grid-Search for Transport IoT Data Prediction. In: Liu Q., Liu X., Cheng J., Shen T., Tian Y. (Ed.), Proceedings of the 12th International Conference on Computer Engineering and Networks: . Paper presented at 12th International Conference on Computer Engineering and Networks, CENet 2022, Haikou, 4 November through 7 November 2022 (pp. 903-910). Springer Science+Business Media B.V.
Open this publication in new window or tab >>Boosting Machine Learning Algorithms with Grid-Search for Transport IoT Data Prediction
Show others...
2022 (English)In: Proceedings of the 12th International Conference on Computer Engineering and Networks / [ed] Liu Q., Liu X., Cheng J., Shen T., Tian Y., Springer Science+Business Media B.V., 2022, p. 903-910Conference paper, Published paper (Refereed)
Abstract [en]

IoT (internet of things) data is a topic we have discussed a lot in recent years. Traffic data is an important part of IoT data. Traffic flow prediction not only facilitates people’s travel and saves our time, but also provides effective technical support for highway traffic control and scheduling. To achieve accurate traffic flow prediction, this study aims to build a machine learning-based traffic flow prediction model. We first screen out the features that have a greater impact on traffic flow. On this basis, this work establishes a traffic flow prediction model based on CatBoost. By comparing with other machine learning models, conclusions can be drawn: CatBoost model can accurately predict traffic flow; CatBoost outperforms traditional machine learning models. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2022
Series
Lecture Notes in Electrical Engineering, ISSN 1876-1100, E-ISSN 1876-1119 ; 961
Keywords
CatBoost, Machine learning, Predictive models, Traffic flow, Adaptive boosting, Forecasting, Data prediction, Grid search, Machine learning algorithms, Machine learning models, Machine-learning, Prediction modelling, Traffic flow prediction, Internet of things
National Category
Transport Systems and Logistics Computer Sciences
Identifiers
urn:nbn:se:bth-24180 (URN)10.1007/978-981-19-6901-0_93 (DOI)2-s2.0-85144540165 (Scopus ID)9789811969003 (ISBN)
Conference
12th International Conference on Computer Engineering and Networks, CENet 2022, Haikou, 4 November through 7 November 2022
Available from: 2023-01-10 Created: 2023-01-10 Last updated: 2023-01-10Bibliographically approved
Li, S., Sun, B., Geng, R., Zhang, L. & Shen, T. (2022). Grid-Search Enhanced Tree-Based Machine Learning for Traffic IoT Data Anomaly Detection. In: Liu Q., Liu X., Cheng J., Shen T., Tian Y. (Ed.), Proceedings of the 12th International Conference on Computer Engineering and Networks: . Paper presented at 12th International Conference on Computer Engineering and Networks, CENet 2022, Haikou, 4 November through 7 November 2022 (pp. 3-9). Springer Science+Business Media B.V.
Open this publication in new window or tab >>Grid-Search Enhanced Tree-Based Machine Learning for Traffic IoT Data Anomaly Detection
Show others...
2022 (English)In: Proceedings of the 12th International Conference on Computer Engineering and Networks / [ed] Liu Q., Liu X., Cheng J., Shen T., Tian Y., Springer Science+Business Media B.V., 2022, p. 3-9Conference paper, Published paper (Refereed)
Abstract [en]

Anomaly detection is an important part of machine learning. Detection of outliers in the field of transportation can provide valid data for future traffic predictions or traffic flow analysis. This paper builds a model based on XGBoost to detect outliers in IoT data. The data is preprocessed first, followed by model building. Then we use the grid search to adjust the parameters and substitute the optimal parameters into the building model. To validate the model, we cross-checked it with two benchmark models, iFroset and Random Forest. The final experimental results show that the model constructed in this paper can accurately detect outliers in traffic flow and the accuracy is better than that of the baseline model. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2022
Series
Lecture Notes in Electrical Engineering, ISSN 1876-1100, E-ISSN 1876-1119 ; 961
Keywords
Anomaly detection, Traffic flow, XGBoost, Internet of things, Machine learning, Statistics, Data anomalies, Grid search, Machine-learning, Model-based OPC, Traffic flow analysis, Traffic prediction, Tree-based
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-24181 (URN)10.1007/978-981-19-6901-0_1 (DOI)2-s2.0-85144536702 (Scopus ID)9789811969003 (ISBN)
Conference
12th International Conference on Computer Engineering and Networks, CENet 2022, Haikou, 4 November through 7 November 2022
Available from: 2023-01-10 Created: 2023-01-10 Last updated: 2023-01-10Bibliographically approved
Sun, B., Ma, L., Shen, T., Geng, R., Zhou, Y. & Tian, Y. (2021). A Robust Data-Driven Method for Multiseasonality and Heteroscedasticity in Time Series Preprocessing. Wireless Communications & Mobile Computing, 2021, Article ID 6692390.
Open this publication in new window or tab >>A Robust Data-Driven Method for Multiseasonality and Heteroscedasticity in Time Series Preprocessing
Show others...
2021 (English)In: Wireless Communications & Mobile Computing, ISSN 1530-8669, E-ISSN 1530-8677, Vol. 2021, article id 6692390Article in journal (Refereed) Published
Abstract [en]

Internet of Things (IoT) is emerging, and 5G enables much more data transport from mobile and wireless sources. The data to be transmitted is too much compared to link capacity. Labelling data and transmit only useful part of the collected data or their features is a promising solution for this challenge. Abnormal data are valuable due to the need to train models and to detect anomalies when being compared to already overflowing normal data. Labelling can be done in data sources or edges to balance the load and computing between sources, edges, and centres. However, unsupervised labelling method is still a challenge preventing to implement the above solutions. Two main problems in unsupervised labelling are long-term dynamic multiseasonality and heteroscedasticity. This paper proposes a data-driven method to handle modelling and heteroscedasticity problems. The method contains the following main steps. First, raw data are preprocessed and grouped. Second, main models are built for each group. Third, models are adapted back to the original measured data to get raw residuals. Fourth, raw residuals go through deheteroscedasticity and become normalized residuals. Finally, normalized residuals are used to conduct anomaly detection. The experimental results with real-world data show that our method successfully increases receiver-operating characteristic (AUC) by about 30%.

Place, publisher, year, edition, pages
Wiley-Hindawi, 2021
Keywords
Heterogeneity, Anomaly detection, Internet of things
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-22115 (URN)10.1155/2021/6692390 (DOI)000691119100005 ()2-s2.0-85114084492 (Scopus ID)
Note

open access

Available from: 2021-09-09 Created: 2021-09-09 Last updated: 2021-09-16Bibliographically approved
Li, B., Cheng, W., Bie, Y. & Sun, B. (2019). Capacity of Advance Right-Turn Motorized Vehicles at Signalized Intersections for Mixed Traffic Conditions. Mathematical problems in engineering (Print), Article ID 3854604.
Open this publication in new window or tab >>Capacity of Advance Right-Turn Motorized Vehicles at Signalized Intersections for Mixed Traffic Conditions
2019 (English)In: Mathematical problems in engineering (Print), ISSN 1024-123X, E-ISSN 1563-5147, article id 3854604Article in journal (Refereed) Published
Abstract [en]

Right-turn motorized vehicles turn right using channelized islands, which are used to improve the capacity of intersections. For ease of description, these kinds of right-turn motorized vehicles are called advance right-turn motorized vehicles (ARTMVs) in this paper. The authors analyzed four aspects of traffic conflict involving ARTMVs with other forms of traffic flow. A capacity model of ARTMVs is presented here using shockwave theory and gap acceptance theory. The proposed capacity model was validated by comparison to the results of the observations based on data collected at a single intersection with channelized islands in Kunming, the Highway Capacity Manual (HCM) model and the VISSIM simulation model. To facilitate engineering applications, the relationship describing the capacity of the ARTMVs with reference to the distance between the conflict zone and the stop line and the relationship describing the capacity of the ARTMVs with reference to the effective red time of the nonmotorized vehicles moving in the same direction were analyzed. The authors compared these results to the capacity of no advance right-turn motorized vehicles (NARTMVs). The results show that the capacity of the ARTMVs is more sensitive to the changes in the arrival rate of nonmotorized vehicles when the arrival rate of the nonmotorized vehicles is 500(veh/h)similar to 2000(veh/h) than when the arrival rate is some other value. In addition, the capacity of NARTMVs is greater than the capacity of ARTMVs when the nonmotorized vehicles have a higher arrival rate.

Place, publisher, year, edition, pages
HINDAWI LTD, 2019
Keywords
reliability
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-18597 (URN)10.1155/2019/3854604 (DOI)000473340300001 ()
Note

open access

Available from: 2019-09-09 Created: 2019-09-09 Last updated: 2019-09-20Bibliographically approved
Sun, B., Cheng, W., Goswami, P. & Bai, G. (2018). Short-Term Traffic Forecasting Using Self-Adjusting k-Nearest Neighbours. IET Intelligent Transport Systems, 12(1), 41-48
Open this publication in new window or tab >>Short-Term Traffic Forecasting Using Self-Adjusting k-Nearest Neighbours
2018 (English)In: IET Intelligent Transport Systems, ISSN 1751-956X, E-ISSN 1751-9578, Vol. 12, no 1, p. 41-48Article in journal (Refereed) Published
Abstract [en]

Short-term traffic forecasting is becoming more important in intelligent transportation systems. The k-nearest neighbours (kNN) method is widely used for short-term traffic forecasting.However, kNN parameters self-adjustment has been a problem due to dynamic traffic characteristics. This paper proposes a fully automatic dynamic procedure kNN (DP-kNN) that makes the kNN parameters self-adjustable and robust without predefined models or training. We used realworld data with more than one-year traffic records to conduct experiments. The results show that DP-kNN can perform better than manually adjusted kNN and other benchmarking methods with regards to accuracy on average. This study also discusses the difference between holiday and workday traffic prediction as well as the usage of neighbour distance measurement.

Place, publisher, year, edition, pages
Institution of Engineering and Technology, 2018
Keywords
intelligent transportation systems; short-term traffic forecasting; road traffic; DP-kNN; dynamic procedure kNN; self-adjusting k-nearest neighbours
National Category
Computer Sciences Transport Systems and Logistics
Identifiers
urn:nbn:se:bth-15727 (URN)10.1049/iet-its.2016.0263 (DOI)000426045200006 ()
Available from: 2018-01-09 Created: 2018-01-09 Last updated: 2023-12-28Bibliographically approved
Ma, L., Sun, B. & Han, C. (2018). Training Instance Random Sampling Based Evidential Classification Forest Algorithms. In: 2018 21st International Conference on Information Fusion, FUSION 2018: . Paper presented at 21st International Conference on Information Fusion, FUSION,Cambridge (pp. 883-888). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Training Instance Random Sampling Based Evidential Classification Forest Algorithms
2018 (English)In: 2018 21st International Conference on Information Fusion, FUSION 2018, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 883-888Conference paper, Published paper (Refereed)
Abstract [en]

Modelling and handling epistemic uncertainty with belief function theory, different ways to learn classification forests from evidential training data are explored. In this paper, multiple base classifiers are learned on uncertain training subsets generated by training instance random sampling approach. For base classifier learning, with the tool of evidential likelihood function, gini impurity intervals of uncertain datasets are calculated for attribute splitting and consonant mass functions of labels are generated for leaf node prediction. The construction of gini impurity based belief binary classification tree is proposed and then compared with C4.5 belief classification tree. For base classifier combination strategy, both evidence combination method for consonant mass function outputs and majority voting method for precise label outputs are discussed. The performances of different proposed algorithms are compared and analysed with experiments on VCI Balance scale dataset. © 2018 ISIF

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2018
Keywords
Binary trees, Forestry, Functions, Information fusion, Linguistics, Uncertainty analysis, Belief function theory, Binary classification, Classification trees, Epistemic uncertainties, Evidence combination, Likelihood functions, Multiple base classifiers, Training subsets, Classification (of information)
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:bth-17105 (URN)10.23919/ICIF.2018.8455427 (DOI)000495071900122 ()2-s2.0-85054089356 (Scopus ID)9780996452762 (ISBN)
Conference
21st International Conference on Information Fusion, FUSION,Cambridge
Available from: 2018-10-11 Created: 2018-10-11 Last updated: 2019-12-13Bibliographically approved
Ma, L., Sun, B. & Li, Z. (2017). Bagging likelihood-based belief decision trees. In: 20th International Conference on Information Fusion, Fusion 2017: Proceedings. Paper presented at 20th International Conference on Information Fusion, Fusion, Xian (pp. 321-326). Institute of Electrical and Electronics Engineers (IEEE), Article ID 8009664.
Open this publication in new window or tab >>Bagging likelihood-based belief decision trees
2017 (English)In: 20th International Conference on Information Fusion, Fusion 2017: Proceedings, Institute of Electrical and Electronics Engineers (IEEE), 2017, p. 321-326, article id 8009664Conference paper, Published paper (Refereed)
Abstract [en]

To embed ensemble techniques into belief decision trees for performance improvement, the bagging algorithm is explored. Simple belief decision trees based on entropy intervals extracted from evidential likelihood are constructed as the base classifiers, and a combination of individual trees promises to lead to a better classification accuracy. Requiring no extra querying cost, bagging belief decision trees can obtain good classification performance by simple belief tree combination, making it an alternative to single belief tree with querying. Experiments on UCI datasets verify the effectiveness of bagging approach. In various uncertain cases, the bagging method outperforms single belief tree without querying, and is comparable in accuracy to single tree with querying. © 2017 International Society of Information Fusion (ISIF).

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2017
Keywords
bagging, belief function theory, decision trees, evidential likelihood, Decision theory, Forestry, Information fusion, Uncertainty analysis, Bagging algorithms, Bagging approach, Classification accuracy, Classification performance, Ensemble techniques, Trees (mathematics)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:bth-15216 (URN)10.23919/ICIF.2017.8009664 (DOI)000410938300047 ()2-s2.0-85029439190 (Scopus ID)9780996452700 (ISBN)
Conference
20th International Conference on Information Fusion, Fusion, Xian
Available from: 2017-09-29 Created: 2017-09-29 Last updated: 2018-01-13Bibliographically approved
Sun, B., Cheng, W., Bai, G. & Goswami, P. (2017). Correcting and complementing freeway traffic accident data using mahalanobis distance based outlier detection. Technical Gazette, 24(5), 1597-1607
Open this publication in new window or tab >>Correcting and complementing freeway traffic accident data using mahalanobis distance based outlier detection
2017 (English)In: Technical Gazette, ISSN 1330-3651, E-ISSN 1848-6339, Vol. 24, no 5, p. 1597-1607Article in journal (Refereed) Published
Abstract [en]

A huge amount of traffic data is archived which can be used in data mining especially supervised learning. However, it is not being fully used due to lack of accurate accident information (labels). In this study, we improve a Mahalanobis distance based algorithm to be able to handle differential data to estimate flow fluctuations and detect accidents and use it to support correcting and complementing accident information. The outlier detection algorithm provides accurate suggestions for accident occurring time, duration and direction. We also develop a system with interactive user interface to realize this procedure. There are three contributions for data handling. Firstly, we propose to use multi-metric traffic data instead of single metric for traffic outlier detection. Secondly, we present a practical method to organise traffic data and to evaluate the organisation for Mahalanobis distance. Thirdly, we describe a general method to modify Mahalanobis distance algorithms to be updatable. © 2017, Strojarski Facultet. All rights reserved.

Place, publisher, year, edition, pages
Strojarski Facultet, 2017
Keywords
Accident data, Data labelling, Differential distance, Mahalanobis distance, Outlier detection, Traffic data, Updatable algorithm, Accidents, Data mining, Statistics, User interfaces, Mahalanobis distances, Data handling
National Category
Communication Systems Computer and Information Sciences
Identifiers
urn:nbn:se:bth-15472 (URN)10.17559/TV-20150616163905 (DOI)000417100300037 ()2-s2.0-85032512786 (Scopus ID)
Note

Funded by National Natural Science Foundation of China

Funding nr. 61364019

Available from: 2017-11-10 Created: 2017-11-10 Last updated: 2023-12-28Bibliographically approved
Ma, L., Sun, B. & Han, C. (2017). Learning Decision Forest from Evidential Data: the Random Training Set Sampling Approach. In: International Conference on Systems and Informatics: . Paper presented at 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, (pp. 1423-1428). IEEE
Open this publication in new window or tab >>Learning Decision Forest from Evidential Data: the Random Training Set Sampling Approach
2017 (English)In: International Conference on Systems and Informatics, IEEE , 2017, p. 1423-1428Conference paper, Published paper (Refereed)
Abstract [en]

To learn decision trees from uncertain data modelled by mass functions, the random training set sampling approach for learning belief decision forests is proposed. Given an uncertain training set, a collection of simple belief decision trees are trained separately on each corresponding new set drawn by random sampling from the original one. Then the final prediction is made by majority voting. After discussing the selection of parameters for belief decision forests, experiments on Balance scale data are carried on for performance validation. Results show that with different kinds of uncertainty, the proposed method guarantees an obvious improvement in classification accuracy.

Place, publisher, year, edition, pages
IEEE, 2017
Series
International Conference on Systems and Informatics, ISSN 2474-0217
Keywords
Decision trees, Decision Forest, Learning algorithms
National Category
Information Systems
Identifiers
urn:nbn:se:bth-16125 (URN)000427752100257 ()978-1-5386-1107-4 (ISBN)
Conference
4th International Conference on Systems and Informatics (ICSAI), Hangzhou,
Available from: 2018-04-26 Created: 2018-04-26 Last updated: 2018-05-24Bibliographically approved
Sun, B. (2017). Toward Automatic Data-Driven Traffic Time Series Prediction. In: 5th Swedish Workshop on Data Science: . Paper presented at 5th Swedish Workshop on Data Science, Gothenburg. , 12, Article ID 12.
Open this publication in new window or tab >>Toward Automatic Data-Driven Traffic Time Series Prediction
2017 (English)In: 5th Swedish Workshop on Data Science, 2017, Vol. 12, article id 12Conference paper, Poster (with or without abstract) (Other academic)
Abstract [en]

Short-term traffic prediction on freeways has been an active research subject in the past several decades. Various algorithms covering a broad range of topics regarding performance, data requirements and efficiency have been proposed. However, the implementation of machine learning based algorithms in traffic management centres is still limited. Two main reasons for this situation are, the data is messy or missing, and the parameter tuning requires experienced engineers.

The main objective of this thesis was to develop a procedure that can improve the performance and automation level of short-term traffic prediction.

Missing data is a problem that prevents many prediction algorithms in ITS from working effectively. Much work has been done to impute those missing data. Among different imputation methods, k-nearest neighbours (kNN) has shown excellent accuracy and efficiency. However, the general kNN is designed for matrix instead of time series so it lacks the usage of time series characteristics such as windows and weights that are gap-sensitive. We introduce gap-sensitive windowed kNN (GSW-kNN) imputation for time series. The results show that GSW-kNN is 34% more accurate than benchmarking methods, and it is still robust even if the missing ratio increases to 90%.

Lacking accurate accident information (labels) is another problem that prevents huge amount of traffic data to be fully used. We improve a Mahalanobis distance based algorithm to be able to handle differential data to estimate flow fluctuations and detect accidents and use it to support correcting and complementing accident information. The outlier detection algorithm provides accurate suggestions for accident occurring time, duration and direction. We also develop a system with interactive user interface to realize this procedure. There are three contributions for data handling. Firstly, we propose to use multi-metric traffic data instead of single metric for traffic outlier detection. Secondly, we present a practical method to organise traffic data and to evaluate the organisation for Mahalanobis distance. Thirdly, we describe a general method to modify Mahalanobis distance algorithms to be updatable.

For automatic parameter tuning, the experiments show that the flow-aware strategy performs better than the time-aware one. Thus, we use all parameter strategies simultaneously as ensemble strategies especially by including window in flow-aware strategies.

Based on the above studies, we have developed online-orientated and offline-orientated algorithms for real-time traffic forecasting. The online automatic tuned version is performing near the optimal manual tuned performance. The offline version gives the performance that cannot be achieved using the manual tuning. It is also 3.05% better than XGB and 11.7% better than traditional SARIMA.

National Category
Computer Sciences Transport Systems and Logistics
Identifiers
urn:nbn:se:bth-15725 (URN)
Conference
5th Swedish Workshop on Data Science, Gothenburg
Available from: 2018-01-09 Created: 2018-01-09 Last updated: 2024-03-28Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5824-425X

Search in DiVA

Show all publications