Evaluating machine learning strategies for classification of large-scale Kubernetes cluster logs
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Kubernetes is a free, open-source container orchestration system for deploying and managing Docker containers that host microservices. Its cluster logs are extremely helpful in determining the root cause of a failure. However, as systems become more complex, locating failures becomes more difficult and time-consuming. This study aims to identify the classification algorithms that accurately classify the given log data and, at the same time, require fewer computational resources. Because the data is quite large, we begin with expert-based feature selection to reduce the data size. Following that, TF-IDF feature extraction is performed, and finally, we compare five classification algorithms, SVM, KNN, random forest, gradient boosting and MLP using several metrics. The results show that Random forest produces good accuracy while requiring fewer computational resources compared to other algorithms.
Place, publisher, year, edition, pages
2022. , p. 70
Keywords [en]
Kubernetes logs, feature selection, feature extraction, multi-class classification, Computational cost
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-23934OAI: oai:DiVA.org:bth-23934DiVA, id: diva2:1711100
External cooperation
Ericsson
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVACO Master's program in computer science 120,0 hp
Presentation
2022-09-27, Zoom, Karlskrona, 10:00 (English)
Supervisors
Examiners
2022-11-162022-11-152025-09-30Bibliographically approved