Comparison of Machine Learning Algorithms for Anomaly Detection in Train’s Real-Time Ethernet using an Intrusion Detection System
2022 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
Background: The train communication network is vulnerable to intrusion assaultsbecause of the openness of the ethernet communication protocol. Therefore, an intru-sion detection system must be incorporated into the train communication network.There are many algorithms available in Machine Learning(ML) to develop the Intru-sion Detection System(IDS). Majorly, depending on the accuracy and execution timeof the algorithm, it is decided as the best. Performance metrics like F1 score, preci-sion, recall, and support are compared to see how well the algorithm fits the modelwhile training. The following thesis will detect the anomalies in the Train ControlManagement System(TCMS) and then the comparison of various algorithms will beheld in order to declare the accurate algorithm.
Objectives: In this thesis work, we aim to research anomaly detection in a train’sreal-time ethernet using an IDS. The main objectives of this thesis include per-forming Principal Component Analysis(PCA) and feature selection using RandomForest(RF) for simplifying the complexity of the dataset by reducing dimensionalityand extracting significant features. Followed by, choosing the most consistent algo-rithm for anomaly detection from the selected algorithms by evaluating performanceparameters, especially accuracy and execution time after training the models usingML algorithms.
Method: This thesis necessitates one research methodology which is experimen-tation, to answer our research questions. For RQ1, experimentation will help usgain better insights into the dataset to extract valuable and essential features as apart of feature selection using RF and dimensionality reduction using PCA. RQ2also uses experimentation because it provides better accuracy and reliability. Afterpre-processing, the data will be used to train the algorithms and will be evaluatedusing various methods.
Results: In this study, we have analysed data using EDA, reduced dimensionalityand feature selection using PCA and RF algorithm respectively. We used five su-pervised machine learning methods namely, Support Vector Machine(SVM), NaiveBayes, Decision Tree, K-nearest Neighbor(KNN), and Random Forest(RF). Aftertesting and utilizing the "KDDCup 1999" pre-processed dataset from the Universityof California Irvine(UCI) ML repository, Decision Tree model has been concludedas the best-performing algorithm with an accuracy of 98.89% in 0.098 seconds, incomparison to other models.
Conclusions: Five models have been trained using the five ML techniques foranomaly detection using an IDS. We concluded that the decision tree trained modelhas optimal performance with an accuracy of 98.89% and time of 0.098 seconds
Place, publisher, year, edition, pages
2022. , p. 41
Keywords [en]
Anomaly detection, Computing methodologies, Machine learning, Real-time Ethernet, Supervised learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-23802OAI: oai:DiVA.org:bth-23802DiVA, id: diva2:1707593
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Presentation
2022-09-23, Blekinge Institute of Technology, Karlskrona, 10:35 (English)
Supervisors
Examiners
2022-11-012022-11-012022-11-01Bibliographically approved