Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
ON EVALUATING MACHINE LEARNING APPROACHES FOR EFFICIENT CLASSIFICATION OF TRAFFIC PATTERNS
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context. With the increased usage of mobile devices and internet, the cellular network traffic has increased tremendously. This increase in network traffic has led to increased occurrences of communication failures among the network nodes. Each communication failure among the nodes is defined as a bad event and occurrence of one such bad event acts as a source of origin for several consecutive bad events. These bad events as a whole may eventually lead to node failures (not being able to respond to any data requests). But it requires a lot of human effort and cost to be invested in by the telecom companies to implement workarounds for these node failures. So, there is a need to prevent node failures from happening. This can be done by classifying the traffic patterns between nodes in the network, identify bad events in them and deliver the verdict immediately after their detection.

Objectives. Through this study, we aim to find the best suitable machine learning algorithm which can efficiently classify the traffic patterns of SGSN-MME (SGSN (Serving GPRS (General Packet Radio Service) Support node) and MME (Mobility Management Entity). SGSN-MME is a network management tool designed to support the functionalities of two nodes namely SGSN and MME. We do this by evaluating the classification performance of four machine learning classification algorithms, namely Support vector machines (SVMs), Naïve Bayes, Decision trees and Random forests, on the traffic patterns of SGSN and MME. The selected classification algorithm will be developed in such a way that, whenever it detects a bad event, it notifies the user about it by prompting a message saying, “Something bad is happening”.

Methods. We have conducted an experiment for evaluating the classification performance of our four chosen classification algorithms on the dataset provided by Ericsson AB, Gothenburg. The experimental dataset is a combination of three logs, one of which represents the traffic patterns in real network and the other two logs contain synthetic traffic patterns that are generated manually. The dataset is unlabeled with 720 data instances and 4019 attributes in it. K-means clustering is performed for dividing the data instances into groups and thereby proceed with labeling them accordingly into good and bad events. Also, since the number of attributes in the experimental dataset are more than the number of instances, feature selection is performed for selecting the subset of relevant attributes which best represents the whole data. All the chosen classification algorithms are trained and tested with ten-fold cross validation sets using the selected subset of attributes and the obtained performance measures like classification accuracy, F1 score and training time are analyzed and compared for selecting the best suitable one among them. Finally, the chosen algorithm is tested on unlabeled real data and the performance measures are analyzed in order to check if is able to detect the bad events correctly or not.

Results. Experimental results showed that Random forests outperformed Support vector machines, Naïve Bayes and Decision trees with an average classification accuracy of 99.72% and average F1 score of 99.6, when classification accuracy and F1 score are considered. On the other hand, Naive Bayes outperformed Support vector machines, Decision trees and Random forests with an average training time of 0.010 seconds, when training time is considered. Also, the classification accuracy and F1 score of Random forests on unlabeled data are found to be 100% and 100 respectively.

Conclusions. Since our study focuses on classifying the traffic patterns of SGSN-MME more accurately, classification accuracy and F1 score are of highest importance than the training time of algorithm. Therefore, based on experimental results, we conclude that Random forests is the best suitable machine learning algorithm for classifying the traffic patterns of SGSN -MME. However, Naive Bayes can be also used if classification has to be performed in the least time possible and with moderate accuracy (around 70%). 

Place, publisher, year, edition, pages
2017.
Keyword [en]
machine learning, classification, traffic patterns, cellular mobile networks.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:bth-14985OAI: oai:DiVA.org:bth-14985DiVA: diva2:1131144
External cooperation
Ericsson AB, Göteborg
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVAXA Master of Science Programme in Computer Science
Examiners
Available from: 2017-08-14 Created: 2017-08-12 Last updated: 2017-08-14Bibliographically approved

Open Access in DiVA

fulltext(493 kB)12 downloads
File information
File name FULLTEXT01.pdfFile size 493 kBChecksum SHA-512
4fd9db78758acf1532ce5ce7d27449de2465a3c0aa6fb6bc606e60ef9c204c176564d210c27367e5c03c0dd82710f68d70bbfe55076756ce7403d49b178de426
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science and Engineering
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 12 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 78 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf