Comparative analysis of text mining and clustering techniques for assessing functional dependency between manual test casesShow others and affiliations
2025 (English)In: Software quality journal, ISSN 0963-9314, E-ISSN 1573-1367, Vol. 33, no 2, article id 24Article in journal (Refereed) Published
Abstract [en]
Text mining techniques, particularly those leveraging machine learning for natural language processing, have gained significant attention for qualitative data analysis in software testing. However, their complexity and lack of transparency can pose challenges, especially in safety-critical domains where simpler, interpretable solutions are often preferred unless accuracy is heavily compromised. This study investigates the trade-offs between complexity, effort, accuracy, and utility in text mining and clustering techniques, focusing on their application for detecting functional dependencies among manual integration test cases in safety-critical systems. Using empirical data from an industrial testing project at ALSTOM Sweden, we evaluate various string distance methods, NCD compressors, and machine learning approaches. The results highlight the impact of preprocessing techniques, such as tokenization, and intrinsic factors, such as text length, on algorithm performance. Findings demonstrate how text mining and clustering can be optimized for safety-critical contexts, offering actionable insights for researchers and practitioners aiming to balance simplicity and effectiveness in their testing workflows.
Place, publisher, year, edition, pages
Springer, 2025. Vol. 33, no 2, article id 24
Keywords [en]
Artificial intelligence, Clustering, Natural language processing, Software testing, Text mining, Cluster analysis, Natural language processing systems, Verification, Clustering techniques, Clusterings, Functional dependency, Language processing, Natural languages, Software testings, Text Clustering, Text mining techniques, Text-mining, Integration testing
National Category
Computer Sciences Artificial Intelligence
Identifiers
URN: urn:nbn:se:bth-27917DOI: 10.1007/s11219-025-09722-7ISI: 001489598700001Scopus ID: 2-s2.0-105005412458OAI: oai:DiVA.org:bth-27917DiVA, id: diva2:1961992
2025-05-282025-05-282025-06-02Bibliographically approved