Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Detecting code smells using industry-relevant data
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0003-3907-3357
Wroclaw University of Science and Technology, POL.
2023 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025, Vol. 155, article id 107112Article in journal (Refereed) Published
Abstract [en]

Context: Code smells are patterns in source code associated with an increased defect rate and a higher maintenance effort than usual, but without a clear definition. Code smells are often detected using rules hard-coded in detection tools. Such rules are often set arbitrarily or derived from data sets tagged by reviewers without the necessary industrial know-how. Conclusions from studying such data sets may be unreliable or even harmful, since algorithms may achieve higher values of performance metrics on them than on models tagged by experts, despite not being industrially useful. Objective: Our goal is to investigate the performance of various machine learning algorithms for automated code smell detection trained on code smell data set(MLCQ) derived from actively developed and industry-relevant projects and reviews performed by experienced software developers. Method: We assign the severity of the smell to the code sample according to a consensus between the severities assigned by the reviewers, use the Matthews Correlation Coefficient (MCC) as our main performance metric to account for the entire confusion matrix, and compare the median value to account for non-normal distributions of performance. We compare 6720 models built using eight machine learning techniques. The entire process is automated and reproducible. Results: Performance of compared techniques depends heavily on analyzed smell. The median value of our performance metric for the best algorithm was 0.81 for Long Method, 0.31 for Feature Envy, 0.51 for Blob, and 0.57 for Data Class. Conclusions: Random Forest and Flexible Discriminant Analysis performed the best overall, but in most cases the performance difference between them and the median algorithm was no more than 10% of the latter. The performance results were stable over multiple iterations. Although the F-score omits one quadrant of the confusion matrix (and thus may differ from MCC), in code smell detection, the actual differences are minimal. © 2022 Elsevier B.V.

Place, publisher, year, edition, pages
Elsevier, 2023. Vol. 155, article id 107112
Keywords [en]
Codes (symbols), Forestry, Image resolution, Learning algorithms, Learning systems, Matrix algebra, Normal distribution, Software engineering, Supervised learning, Technology transfer, Code smell, Confusion matrix, Correlation coefficient, Data set, Machine-learning, Median value, Performance, Performance metrices, Reproducible research, Source codes, Discriminant analysis, Code smells, Machine learning
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-24096DOI: 10.1016/j.infsof.2022.107112ISI: 000901826200018Scopus ID: 2-s2.0-85143501730OAI: oai:DiVA.org:bth-24096DiVA, id: diva2:1719758
Available from: 2022-12-16 Created: 2022-12-16 Last updated: 2023-01-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Madeyski, Lech

Search in DiVA

By author/editor
Madeyski, Lech
By organisation
Department of Software Engineering
In the same journal
Information and Software Technology
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 113 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf