Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automated Duplicate Bug Reports Detection - An Experiment at Axis Communication AB
2017 (English)Independent thesis Advanced level (degree of Master (One Year)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context. Bug tracking systems play an important role in software maintenance. They allow users to submit bug reports. However, it has been observed that often a bug report submitted is a duplicate (when several users submit bug reports for the same problem, these reports are called duplicated issue reports) which results in considerable duplicate bug reports in a bug tracking system. Solutions for automating the process of duplicate bug reports detection can increase the productivity of software maintenance activities, as new incoming bug reports are directly compared with the existing bug reports to identify their similar bug reports, which is no need for the human to spend time reading, understanding, and searching. Although recently there has been considerable research on such solutions, there is still much room for improvement regarding accuracy and recall rate during the duplicate detection process. Besides, very few tools were evaluated in an industrial setting.

Objectives. In this study, firstly, we aim to characterize automated duplicate bug report detection methods by exploring categories of all those methods, identifying proposed evaluation methods, specifying performance difference between the categories of methods. Then we propose a method leveraging recent advances on using semantic model – Doc2vec and present an overall framework - preprocessing, training a semantic model, calculating and ranking similarity, and retrieving duplicate bug reports of the proposed method. Finally, we apply an experiment to evaluate the performance of the proposed method and compare it with the selected best methods for the task of duplicate bug report detection

Methods. To classify and analyze all existing research on automated duplicate bug report detection, we conducted a systematic mapping study. To evaluate our proposed method, we conducted an experiment with an identified number of bug reports on the internal bug report database of Axis Communication AB.

Results. We classified automated duplicate bug report detection techniques into three categories - TOP N recommendation and ranking approach, binary classification approach, and decision-making approach. We found that recall-rate@k is the most common evaluation metric, and found that TOP N recommendation and ranking approach has the best performance among the identified approaches. The experimental results showed that the recall rate of our proposed approach is significantly higher than the combination of TF-IDF with Word2vec and the combination of TF-IDF with LSI. Our combination of Doc2vec and TF-IDF approach, has a recall rate@1-10 of 18.66%-42.88% in the TROUBLE data, which is an improvement of 1.63%-9.42% to the state-of-art.

Conclusions. In this thesis, we identified and classified 44 automated duplicate bug report detection research papers by conducting a systematic mapping study. We provide an overview of the state-of-art, identifying evaluation metrics, investigating the scientific evidence in the reported results, and identifying needs for future research. We implemented a bug tracking system with a duplicate bug report detection module where a list of Top-N related bug reports (along with a numerical value representing a similar score) is created. After conducting the experiment, we found that our proposed approach - the combination of Doc2vec and TF-IDF approach produces the best recall rate.Keywords: Similar

Place, publisher, year, edition, pages
2017.
Keyword [en]
Similar Bugs, Paragraph Vector, Information Retrieval, Recommendation Systems
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-15399OAI: oai:DiVA.org:bth-15399DiVA: diva2:1153748
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAAXA Master of Science Programme in Software Engineering
Available from: 2017-11-03 Created: 2017-10-31 Last updated: 2017-11-03Bibliographically approved

Open Access in DiVA

fulltext(1371 kB)66 downloads
File information
File name FULLTEXT02.pdfFile size 1371 kBChecksum SHA-512
e923d020cc1006f0dffd986e91258ee240d78b9e1e75d6bb8fd37ca2d84e01743ad8184b37c6aaa48ae2285c56a8b5bd8d74dceaab5653fe68a9580dcf3e16f9
Type fulltextMimetype application/pdf

Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 66 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 48 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf