Unsupervised Time Series Anomaly Detection Using NLP for 5G Base-Station TDD Scheduler logs
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Background: Ericsson’s domain experts continuously implement updates to Radio Base Stations (RBS) to enhance their operation, generating large volumes of high-dimensional logs for each test run. Due to the complexity and scale of the data, manually inspecting these logs for anomalies is both time-consuming and labor-intensive. This research explores the application of Natural Language Processing (NLP)-based methods for unsupervised anomaly detection in multivariate time series, specifically targeting mixed-type datasets. These methods aim to effectively identify anomalous User Equipment (UE) by using 5G RBS Time Division Duplexing (TDD) scheduler logs.
Objectives: This thesis aims to develop and implement natural language processing (NLP) techniques for unsupervised anomaly detection within mixed-type multivariate time series data, specifically targeting 5G Radio Base Station (RBS) Time Division Duplex (TDD) logs, to identify abnormal User Equipment (UEs).
Methods: We developed a pipeline for NLP-based multivariate time-series anomaly detection, where the time-series data is transformed into words and sentences. The NLP-based language models are trained on next-word prediction tasks, and their performance is evaluated using perplexity scores. Anomalies are then detected by applying the Peak Over Threshold (POT) method to the obtained perplexity scores. Using IsoForest as the baseline, we compare the effectiveness of various NLP models, including Skip-gram Negative Sampling, BiLSTM with Attention, and Decoder-Only Transformer, based on performance metrics and execution time. Additionally, we evaluate the efficacy of the NLP-based approach for detecting abnormal UEs in 5G RBS TDD logs.
Results: The proposed NLP-based approach successfully detects anomalous UEs in Ericsson 5G RBS TDD Logs with an F1 score of 0.779.
Conclusions: It can be concluded that the proposed NLP-based approach can be effectively used for unsupervised multivariate time series anomaly detection. The model also identifies abnormal UEs in Ericsson 5G RBS TDD Logs, aiding developers in resolving issues and detecting anomalies in the log files.
Place, publisher, year, edition, pages
2024. , p. 60
Keywords [en]
Natural Language Processing, Unsupervised Multivariate Time Series Anomaly Detection, Radio Base Station, Time Division Duplexing, Unsupervised Learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-27154OAI: oai:DiVA.org:bth-27154DiVA, id: diva2:1915757
External cooperation
ericsson
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVADA Master Qualification Plan in Computer Science
Presentation
2024-09-23, J3506 Platon, Blekinge institute of technology, Valhallavägen 10, Karlskrona, 13:00 (English)
Supervisors
Examiners
2024-12-032024-11-252025-09-30Bibliographically approved