Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
MLpylint: Automating the Identification of Machine Learning-Specific Code Smells
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.
2023 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background. Machine learning (ML) has rapidly grown in popularity, becoming a vital part of many industries. This swift expansion has brought about new challenges to technical debt, maintainability and the general software quality of ML systems. With ML applications becoming more prevalent, there is an emerging need for extensive research to keep up with the pace of developments. Currently, the research on code smells in ML applications is limited and there is a lack of tools and studies that address these issues in-depth. This gap in the research highlights the necessity for a focused investigation into the validity of ML-specific code smells in ML applications, setting the stage for this research study.

Objectives. Addressing the limited research on ML-specific code smells within Python-based ML applications. To achieve this, the study begins with the identification of these ML-specific code smells. Once recognized, the next objective is to choose suitable methods and tools to design and develop a static code analysis tool based on code smell criteria. After development, an empirical evaluation will assess both the tool’s efficacy and performance. Additionally, feedback from industry professionals will be sought to measure the tool’s feasibility and usefulness.

Methods. This research employed Design Science Methodology. In the problem identification phase, a literature review was conducted to identify ML-specific code smells. In solution design, a secondary literature review and consultations with experts were performed to select methods and tools for implementing the tool. Additionally, 160 open-source ML applications were sourced from GitHub. The tool was empirically tested against these applications, with a focus on assessing its performance and efficacy. Furthermore, using the static validation method, feedback on the tool’s usefulness was gathered through an expert survey, which involved 15 ML professionals from Ericsson.

Results. The study introduced MLpylint, a tool designed to identify 20 ML-specific code smells in Python-based ML applications. MLpylint effectively analyzed 160ML applications within 36 minutes, identifying in total 5380 code smells, although, highlighting the need for further refinements to each code smell checker to accurately identify specific patterns. In the expert survey, 15 ML professionals from Ericsson acknowledged the tool’s usefulness, user-friendliness and efficiency. However, they also indicated room for improvement in fine-tuning the tool to avoid ambiguous smells.

Conclusions. Current studies on ML-specific code smells are limited, with few tools addressing them. The development and evaluation of MLpylint is a significant advancement in the ML software quality domain, enhancing reliability and reducing associated technical debt in ML applications. As the industry integrates such tools, it’s vital they evolve to detect code smells from new ML libraries. Aiding developers in upholding superior software quality but also promoting further research in the ML software quality domain.

Place, publisher, year, edition, pages
2023. , p. 63
Keywords [en]
Code Smell, Machine Learning, Static Code Analysis, Software Quality, Technical Debt
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:bth-25490OAI: oai:DiVA.org:bth-25490DiVA, id: diva2:1807458
Subject / course
PA2534 Master's Thesis (120 credits) in Software Engineering
Educational program
PAASI Master of Science Programme in Software Engineering
Supervisors
Examiners
Available from: 2023-10-26 Created: 2023-10-26 Last updated: 2023-10-26Bibliographically approved

Open Access in DiVA

MLpylint: Automating the Identification of Machine Learning-Specific Code Smells(828 kB)28 downloads
File information
File name FULLTEXT02.pdfFile size 828 kBChecksum SHA-512
f0519bca2335d4811d5d0443d8eec1e2dbc48df96075db1b6dd0d5a8d63a70950c5117ec65d1446002a96b88aa5d7758d7c4070369182d8463d8e9bccd069ca4
Type fulltextMimetype application/pdf

By organisation
Department of Software Engineering
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 28 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 101 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf