Planned maintenance
A system upgrade is planned for 24/9-2024, at 12:00-14:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based Systems
University of Innsbruck, AUT.
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0003-3818-4442
Software Competence Center Hagenberg GmbH, AUT.
2022 (English)In: Proceedings - 1st International Conference on AI Engineering - Software Engineering for AI, CAIN 2022, Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 229-239Conference paper, Published paper (Refereed)
Abstract [en]

High data quality is fundamental for today's AI-based systems. However, although data quality has been an object of research for decades, there is a clear lack of research on potential data quality issues (e.g., ambiguous, extraneous values). These kinds of issues are latent in nature and thus often not obvious. Nevertheless, they can be associated with an increased risk of future problems in AI-based systems (e.g., technical debt, data-induced faults). As a counterpart to code smells in software engineering, we refer to such issues as Data Smells. This article conceptualizes data smells and elaborates on their causes, consequences, detection, and use in the context of AI-based systems. In addition, a catalogue of 36 data smells divided into three categories (i.e., Believability Smells, Understandability Smells, Consistency Smells) is presented. Moreover, the article outlines tool support for detecting data smells and presents the result of an initial smell detection on more than 240 real-world datasets. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022. p. 229-239
Keywords [en]
Data reduction, Odors, Code smell, Data engineering, Data quality, Data smell, On potentials, Quality issues, Technical debts, Three categories, Tool support, Understandability, Software engineering, artificial intelligence, data smells
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-23541DOI: 10.1145/3522664.3528590Scopus ID: 2-s2.0-85133411277ISBN: 9781450392754 (print)OAI: oai:DiVA.org:bth-23541DiVA, id: diva2:1687071
Conference
1st International Conference on AI Engineering - Software Engineering for AI, CAIN 2022, Pittsburgh, 16 May 2022 through 17 May 2022
Note

open access

Available from: 2022-08-12 Created: 2022-08-12 Last updated: 2022-12-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusarXiv.org

Authority records

Felderer, Michael

Search in DiVA

By author/editor
Felderer, Michael
By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 85 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf