Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Anomaly Detection in Log Files Using Machine Learning Techniques
Blekinge Institute of Technology, Faculty of Computing.
2021 (English)Independent thesis Advanced level (degree of Master (One Year)), 80 credits / 120 HE creditsStudent thesis
Abstract [en]

Context: Log files are produced in most larger computer systems today which contain highly valuable information about the behavior of the system and thus they are consulted fairly often in order to analyze behavioral aspects of the system. Because of the very high number of log entries produced in some systems, it is however extremely difficult to seek out relevant information in these files. Computer-based log analysis techniques are therefore indispensable for the method of finding relevant data in log files.

Objectives: The major problem is to find important events in log files. Events in the test suite such as connections error or disruption are not considered abnormal events. Rather the events which cause system interruption must be considered abnormal events. The goal is to use machine learning techniques to "learn" what an"expected" behavior of a particular test suite is. This means that the system must be able to learn to distinguish between a log file that has an anomaly, and which does not have an anomaly based on the previous sequences.

Methods: Various algorithms are implemented and compared to other existing algorithms based on their performance. The algorithms are executed on a parsed set of labeled log files and are evaluated by analyzing the anomalous events contained in the log files by conducting an experiment using the algorithms. The algorithms used were Local Outlier Factor, Random Forest, and Term Frequency Inverse DocumentFrequency. We then use clustering using KMeans and PCA to gain some valuable insights from the data by observing groups of data points to find the anomalous events.

Results: The results show that the Term Frequency Inverse Document Frequency method works better in finding the anomalous events in the data compared to the other two approaches after conducting an experiment which is discussed in detail.

Conclusions: The results will help developers to find the anomalous events without manually looking at the log file row by row. The model provides the events which are behaving differently compared to the rest of the event in the log and that causes the system to interrupt.

Place, publisher, year, edition, pages
2021. , p. 67
Keywords [en]
Anomaly Detection, Log Files, Machine Learning, Clustering, Outlier Detection
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-21179OAI: oai:DiVA.org:bth-21179DiVA, id: diva2:1534187
External cooperation
Ericsson AB
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVAXA Master of Science Programme in Computer Science
Presentation
2020-09-22, 22:56 (English)
Supervisors
Examiners
Available from: 2021-03-09 Created: 2021-03-04 Last updated: 2021-03-09Bibliographically approved

Open Access in DiVA

Anomaly Detection in Log Files Using Machine Learning Techniques(2687 kB)11809 downloads
File information
File name FULLTEXT02.pdfFile size 2687 kBChecksum SHA-512
65a306e42a71d79de5bd67e58cf7c8ebe17217b27d4ace63b8eac12ef6d93faf324e096fc7d6109b5042ae391eb807efb6bfb8015132ecd229c2e7dd6bf37155
Type fulltextMimetype application/pdf

By organisation
Faculty of Computing
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 11819 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2433 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf