Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automating Log Analysis
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2021 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background: With the advent of the information age, there are many large numbers of services rising which run on several clusters of computers.  Maintaining such large complex systems is a very difficult task. Developers use one tool which is common for almost all software systems, they are the console logs. To troubleshoot problems, developers refer to these logs to solve the issue. Identifying anomalies in the logs would lead us to the cause of the problem, thereby automating the analysis of logs. This study focuses on anomaly detection in logs.

Objectives: The main goal of the thesis is to identify different algorithms for anomaly detection in logs, implement the algorithms and compare them by doing an experiment.

Methods: A literature review had been conducted for identifying the most suitable algorithms for anomaly detection in logs. An experiment was conducted to compare the algorithms identified in the literature review. The experiment was performed on a dataset of logs generated by Hadoop Data File System (HDFS) servers which consisted of more than 11 million lines of logs. The algorithms that have been compared are K-means, DBSCAN, Isolation Forest, and Local Outlier Factor algorithms which are all unsupervised learning algorithms.

Results: The performance of all these algorithms has been compared using metrics precision, recall, accuracy, F1 score, and run time. Though DBSCAN was the fastest, it resulted in poor recall, similarly Isolation Forest also resulted in poor recall. Local Outlier Factor was the fastest to predict. K-means had the highest precision and Local Outlier Factor had the highest recall, accuracy, and F1 score.

Conclusion: After comparing the metrics of different algorithms, we conclude that Local Outlier Factor performed better than the other algorithms with respect to most of the metrics measured.

Place, publisher, year, edition, pages
2021. , p. 70
Keywords [en]
Anomaly detection, Log analysis, Unsupervised learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-21175OAI: oai:DiVA.org:bth-21175DiVA, id: diva2:1534165
External cooperation
Volvo Group IT
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVADA Master Qualification Plan in Computer Science
Presentation
2021-01-26, Zoom meeting, Blekinge Tekniska Högskola, 371 79 Karlskrona, Karlskrona, Sweden, 09:00 (English)
Supervisors
Examiners
Available from: 2021-03-09 Created: 2021-03-04 Last updated: 2021-03-09Bibliographically approved

Open Access in DiVA

Automating Log Analysis(1408 kB)788 downloads
File information
File name FULLTEXT02.pdfFile size 1408 kBChecksum SHA-512
26191acfef8173cded5ff9c2665ee6403f1c302eed03de7758855af2fc108c00339cee171130cef0ce85ac10323ebca50be32c646d5e2d0c4d64aa8793d6138d
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 788 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 746 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf