Change search
Link to record
Permanent link

Direct link
Publications (10 of 50) Show all publications
Yu, L., Alégroth, E., Chatzipetrou, P. & Gorschek, T. (2026). Evaluating the Quality of GenAI Applications in Software Engineering: A Multi-case Study. Empirical Software Engineering, 31(2), Article ID 29.
Open this publication in new window or tab >>Evaluating the Quality of GenAI Applications in Software Engineering: A Multi-case Study
2026 (English)In: Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 31, no 2, article id 29Article in journal (Refereed) Published
Abstract [en]

Context: Generative AI (GenAI) is increasingly adopted in software development for tasks such as document generation, data analysis, and code generation.However, evaluating the quality of GenAI applications becomes challenging, as traditional quality measurements may not be fully applicable.

Objective: In this study, we explore how practitioners evaluate the quality of GenAI applications and investigate quality evaluation techniques.

Method: We conducted a multi-case study in three industrial projects from software development companies.We examined four GenAI application domains: document generation, data analysis and insight generation, customer service, and code generation.Data were collected through three workshops and 23 semi-structured interviews with industrial practitioners.

Results: We identified fourteen GenAI use cases and 28 metrics currently used to evaluate the quality of GenAI applications' outputs.We synthesized the identified metrics' usage patterns and challenges based on the collected data.

Conclusions: This study presents practical insights into using metrics to measure GenAI-based system qualities in real industrial settings.Our findings indicate that practitioners use custom-built and context‑specific metrics; combining these with academic metrics can strengthen GenAI system quality evaluation.

Place, publisher, year, edition, pages
Springer, 2026
Keywords
GenAI, Generative artificial intelligence, Large language model, LLM, Metric, Quality evaluation
National Category
Software Engineering Artificial Intelligence
Identifiers
urn:nbn:se:bth-28954 (URN)10.1007/s10664-025-10759-2 (DOI)001632325800004 ()2-s2.0-105024070431 (Scopus ID)
Funder
Knowledge Foundation, 20180010
Available from: 2025-12-03 Created: 2025-12-03 Last updated: 2026-01-05Bibliographically approved
Tomic, S., Alégroth, E. & Isaac, M. (2025). Evaluation of the Choice of LLM in a Multi-Agent Solution for GUI-Test Generation. In: Fasolino A.R., Panichella S., Aleti A., Mesbah A. (Ed.), 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025: . Paper presented at 18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025 (pp. 487-497). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Evaluation of the Choice of LLM in a Multi-Agent Solution for GUI-Test Generation
2025 (English)In: 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025 / [ed] Fasolino A.R., Panichella S., Aleti A., Mesbah A., Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 487-497Conference paper, Published paper (Refereed)
Abstract [en]

Automated testing, particularly for GUI-based systems, remains a costly and labor-intensive process and prone to errors. Despite advancements in automation, manual testing still dominates in industrial practice, resulting in delays, higher costs, and increased error rates. Large Language Models (LLMs) have shown great potential to automate tasks traditionally requiring human intervention, leveraging their cognitive-like abilities for test generation and evaluation. In this study, we present PathFinder, a Multi-Agent LLM (MALLM) framework that incorporates four agents responsible for (a) perception and summarization, (b) decision-making, (c) input handling and extraction, and (d) validation, which work collaboratively to automate exploratory web-based GUI testing. The goal of this study is to assess how different LLMs, applied to different agents, affect the efficacy of automated exploratory GUI testing. We evaluate PathFinder with three models, Mistral-Nemo, Gemma2, and Llama3.1, on four e-commerce websites. Thus, 27 permutations of the LLMs, across three agents (excluding the validation agent), to test the hypothesis that a solution with multiple agents, each using different LLMs, is more efficacious (efficient and effective) than a multi-agent solution where all agents use the same LLM. The results indicate that the choice of LLM constellation (combination of LLMs) significantly impacts efficacy, suggesting that a single LLM across agents may yield the best balance of efficacy (measured by F1-score). Hypothesis to explain this result include, but are not limited to: improved decision-making consistency and reduced task coordination discrepancies. The contributions of this study are an architecture for MALLM-based GUI testing, empirical results on its performance, and novel insights into how LLM selection impacts the efficacy of automated testing. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Series
IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW, ISSN 2159-4848
Keywords
AI-Assisted Software Testing, Automated Testing, Large Language Models (LLMs), MALLM, Multi-Agent Systems, Ability testing, Autonomous agents, C (programming language), Intelligent agents, Model checking, Software testing, GUI testing, Language model, Large language model, Multi agent, Multi-agent LLM, Multiagent systems (MASs), Software testings, Test generations, Automatic test pattern generation
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-28172 (URN)10.1109/ICST62969.2025.10989038 (DOI)001506893900043 ()2-s2.0-105007519090 (Scopus ID)9798331508142 (ISBN)
Conference
18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025
Funder
Vinnova, 2024- 00242Knowledge Foundation, 20180010
Available from: 2025-06-23 Created: 2025-06-23 Last updated: 2025-09-30Bibliographically approved
Yu, L., Alégroth, E., Chatzipetrou, P. & Gorschek, T. (2025). Experience with Large Language Model Applications for Information Retrieval from Enterprise Proprietary Data. In: Dietmar Pfahl, Javier Gonzalez Huerta, Jil Klünder, Hina Anwar (Ed.), Product-Focused Software Process Improvement: . Paper presented at 25th International Conference on Product-Focused Software Process Improvement, PROFES 2024, Tartu, Dec 2-4, 2024 (pp. 92-107). Springer, 15452
Open this publication in new window or tab >>Experience with Large Language Model Applications for Information Retrieval from Enterprise Proprietary Data
2025 (English)In: Product-Focused Software Process Improvement / [ed] Dietmar Pfahl, Javier Gonzalez Huerta, Jil Klünder, Hina Anwar, Springer, 2025, Vol. 15452, p. 92-107Conference paper, Published paper (Refereed)
Abstract [en]

Large Language Models (LLMs) offer promising capabilities for information retrieval and processing. However, the LLM deployment for querying proprietary enterprise data poses unique challenges, particularly for companies with strict data security policies. This study shares our experience in setting up a secure LLM environment within a FinTech company and utilizing it for enterprise information retrieval while adhering to data privacy protocols. 

We conducted three workshops and 30 interviews with industrial engineers to gather data and requirements. The interviews further enriched the insights collected from the workshops. We report the steps to deploy an LLM solution in an industrial sandboxed environment and lessons learned from the experience. These lessons contain LLM configuration (e.g., chunk_size and top_k settings), local document ingestion, and evaluating LLM outputs.

Our lessons learned serve as a practical guide for practitioners seeking to use private data with LLMs to achieve better usability, improve user experiences, or explore new business opportunities. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Place, publisher, year, edition, pages
Springer, 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15452
Keywords
AI, Artificial intelligence, Data security, Information retrieval, Large Language Model, LLM, Sandbox environment, Data privacy, Fintech, Enterprise data, Language model, Model application, Modeling environments, Privacy protocols, Security policy, Structured Query Language
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-27326 (URN)10.1007/978-3-031-78386-9_7 (DOI)001423664600007 ()2-s2.0-85211960724 (Scopus ID)9783031783852 (ISBN)
Conference
25th International Conference on Product-Focused Software Process Improvement, PROFES 2024, Tartu, Dec 2-4, 2024
Funder
Knowledge Foundation, 20180010
Available from: 2024-12-28 Created: 2024-12-28 Last updated: 2025-12-03Bibliographically approved
Buarque Franzosi, D., Alégroth, E. & Isaac, M. (2025). LLM-Based Labelling of Recorded Automated GUI-Based Test Cases. In: Fasolino A.R., Panichella S., Aleti A., Mesbah A. (Ed.), 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025: . Paper presented at 18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, April 31-4, 2025 (pp. 453-463). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>LLM-Based Labelling of Recorded Automated GUI-Based Test Cases
2025 (English)In: 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025 / [ed] Fasolino A.R., Panichella S., Aleti A., Mesbah A., Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 453-463Conference paper, Published paper (Refereed)
Abstract [en]

Graphical User Interface (GUI) based testing is a commonly used practice in industry. Although valuable and, in many cases, necessary, it is associated with challenges such as high cost and requirements on both technical and domain expertise. Augmented testing, a novel approach to GUI test automation, aims to mitigate these challenges by allowing users to record and render test cases and test data directly on the GUI of the system under test (SUT). In this context, Scout is an augmented testing tool that captures system states and transitions during manual interaction with the SUT, storing them in a test model that is visually represented in the form of state trees and reports. While this representation provides basic overview of a test suite, e.g. about its size and number of scenarios, it is limited in terms of analysis depth, interpretability, and reproducibility. In particular, without human state labeling, it is challenging to produce meaningful and easily understandable test reports. To address this limitation, we present a novel solution and a demonstrator, integrated into Scout, which leverages large language models (LLMs) to enrich the model-based test case representation by automatically labeling and describing states and describing transitions. We conducted two experiments to evaluate the impact of the solution. First, we compared LLM-enhanced reports with expert-generated reports using embedding distance evaluation metrics. Second, we assessed the usability and perceived value of the enhanced reports through an industrial survey. The results of the study indicate that the plugin can improve readability, actionability, and interpretability of test reports. This work contributes to the automation of GUI testing by reducing the need for manual intervention, e.g. labeling, and technical expertise, e.g. to understand test case models. Although the solution is studied in the context of augmented testing, we argue for the solution's generalizability to related test automation techniques. In addition, we argue that this approach enables actionable insights and lays the groundwork for further research into autonomous testing based on Generative AI. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
High costs, Interpretability, Labelings, Language model, Model-based OPC, Systems under tests, Technical expertise, Test Automation, Test case, Test reports, Graphical user interfaces
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-28173 (URN)10.1109/ICST62969.2025.10988984 (DOI)001506893900040 ()2-s2.0-105007522870 (Scopus ID)9798331508142 (ISBN)
Conference
18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, April 31-4, 2025
Funder
Knowledge Foundation, 20180010Vinnova, 2024-00242
Available from: 2025-06-23 Created: 2025-06-23 Last updated: 2025-09-30Bibliographically approved
Yu, L., Alégroth, E., Chatzipetrou, P. & Gorschek, T. (2025). Measuring the quality of generative AI systems: Mapping metrics to quality characteristics — Snowballing literature review. Information and Software Technology, 186, Article ID 107802.
Open this publication in new window or tab >>Measuring the quality of generative AI systems: Mapping metrics to quality characteristics — Snowballing literature review
2025 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025, Vol. 186, article id 107802Article, review/survey (Refereed) Published
Abstract [en]

Context: Generative Artificial Intelligence (GenAI) and the use of Large Language Models (LLMs) have revolutionized tasks that previously required significant human effort, which has attracted considerable interest from industry stakeholders. This growing interest has accelerated the integration of AI models into various industrial applications. However, the model integration introduces challenges to product quality, as conventional quality measuring methods may fail to assess GenAI systems. Consequently, evaluation techniques for GenAI systems need to be adapted and refined. Examining the current state and applicability of evaluation techniques for the GenAI system outputs is essential.

Objective: This study aims to explore the current metrics, methods, and processes for assessing the outputs of GenAI systems and the potential of risky outputs.

Method: We performed a snowballing literature review to identify metrics, evaluation methods, and evaluation processes from 43 selected papers.

Results: We identified 28 metrics and mapped these metrics to four quality characteristics defined by the ISO/IEC 25023 standard for software systems. Additionally, we discovered three types of evaluation methods to measure the quality of system outputs and a three-step process to assess faulty system outputs. Based on these insights, we suggested a five-step framework for measuring system quality while utilizing GenAI models.

Conclusion: Our findings present a mapping that visualizes candidate metrics to be selected for measuring quality characteristics of GenAI systems, accompanied by step-by-step processes to assist practitioners in conducting quality assessments. 

Place, publisher, year, edition, pages
Elsevier, 2025
Keywords
Evaluation, GenAI, Generative AI, Large language model, LLM, Metric, Quality characteristics, Artificial intelligence, Computer software, ISO Standards, Mapping, Quality control, Artificial intelligence systems, Generative artificial intelligence, Language model, Quality characteristic, Reviews
National Category
Artificial Intelligence
Identifiers
urn:nbn:se:bth-28306 (URN)10.1016/j.infsof.2025.107802 (DOI)001519902000001 ()2-s2.0-105008505516 (Scopus ID)
Funder
Knowledge Foundation, 20180010
Available from: 2025-07-04 Created: 2025-07-04 Last updated: 2025-12-03Bibliographically approved
Coppola, R., Feldt, R., Nass, M. & Alégroth, E. (2025). Ranking approaches for similarity-based web element location. Journal of Systems and Software, 222, Article ID 112286.
Open this publication in new window or tab >>Ranking approaches for similarity-based web element location
2025 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 222, article id 112286Article in journal (Refereed) Published
Abstract [en]

Context: GUI-based tests for web applications are frequently broken by fragility, i.e. regression tests fail due to changing properties of the web elements. The most influential factor for fragility are the locators used in the scripts, i.e. the means of identifying the elements of the GUI.

Objective: We extend a state-of-the-art Multi-Locator solution that considers 14 locators from the DOM model of a web application, and identifies overlapping nodes in the DOM tree (VON-Similo). We augment the approach with standard Machine Learning and Learning to Rank (LTR) approaches to aid the location of web elements.

Method: We document an experiment with a ground truth of 1163 web element pairs, taken from different releases of 40 web applications, to compare the robustness of the algorithms to locator weight change, and the performance of LTR approaches in terms of MeanRank and PctAtN.

Results: Using LTR algorithms, we obtain a maximum probability of finding the correct target at the first position of 88.4% (lowest 82.57%), and among the first three positions of 94.79% (lowest 91.86%). The best mean rank of the correct candidate is 1.57.

Conclusion: The similarity-based approach proved to be highly dependable in the context of web application testing, where a low percentage of matching errors can still be accepted.

Place, publisher, year, edition, pages
Elsevier, 2025
Keywords
GUI testing, Test automation, Test case robustness, Web element locators, XPath locators, Learning to rank, Mean-ranks, Ranking approach, Test case, WEB application, Web applications, Web element locator, Xpath locator, Contrastive Learning
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-27257 (URN)10.1016/j.jss.2024.112286 (DOI)001375573600001 ()2-s2.0-85211062465 (Scopus ID)
Funder
Knowledge Foundation, 20180010
Available from: 2024-12-17 Created: 2024-12-17 Last updated: 2025-09-30Bibliographically approved
Bauer, A., Angermeir, F., Alégroth, E. & Anglert, S. (2025). The Prevalence of Code Review Guidelines for GUI-Based Testing in Open-Source. Information and Software Technology
Open this publication in new window or tab >>The Prevalence of Code Review Guidelines for GUI-Based Testing in Open-Source
2025 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025Article in journal (Other academic) Submitted
Abstract [en]

Context: Code review has become a core practice in collaborative software engineering, helping ensure code quality, detecting potential bugs, and supporting communication among developers. Prior research has shown that code review practices differ between production and test code, suggesting that established code review guidelines may fall short in the context of test and GUI-based test code. Particularly, GUI-based testing lacks adequate support during the code review process. To address this, we proposed a set of code review guidelines specifically designed for reviewing GUI-based test files, which, however, have not yet been empirically evaluated, limiting their practical relevance. 

Objective: This study empirically assesses the extent to which code review comments on GUI-based tests align (explicitly or implicitly) with the concerns captured by the proposed guidelines, and uses the findings to refine the guideline set.

Method: To achieve this, we sampled code review comments discussing GUI-based test files across 100 open-source projects and manually analyzed 1000 pull requests to determine to what extent the reviewers' comments align with the proposed guidelines.

Results: Review comments aligned with the proposed guidelines in 808 of 1000 pull requests. We found empirical evidence for 25 of the 33 guidelines. The most frequently observed guideline concerns the correct use of testing techniques and exception handling, particularly regarding locators, explicit waits, and timeout behavior.

Conclusion: The observed alignment suggests that the proposed guidelines capture concerns articulated in practice, indicating practical relevance for GUI-based test reviews. This represents an initial step towards providing empirical validation of the proposed guidelines, highlighting their potential value in enhancing the quality of GUI-based test reviews.

National Category
Software Engineering
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-28725 (URN)10.2139/ssrn.5547512 (DOI)
Available from: 2025-10-06 Created: 2025-10-06 Last updated: 2025-10-16Bibliographically approved
Bauer, A., Frattini, J. & Alégroth, E. (2024). Augmented Testing to support Manual GUI-based Regression Testing: An Empirical Study. Empirical Software Engineering, 29(6), Article ID 140.
Open this publication in new window or tab >>Augmented Testing to support Manual GUI-based Regression Testing: An Empirical Study
2024 (English)In: Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 29, no 6, article id 140Article in journal (Refereed) Published
Abstract [en]

Context: Manual graphical user interface (GUI) software testing presents a substantial part of the overall practiced testing efforts, despite various research efforts to further increase test automation. Augmented Testing (AT), a novel approach for GUI testing, aims to aid manual GUI-based testing through a tool-supported approach where an intermediary visual layer is rendered between the system under test (SUT) and the tester, superimposing relevant test information.

Objective: The primary objective of this study is to gather empirical evidence regarding AT's efficiency compared to manual GUI-based regression testing. Existing studies involving testing approaches under the AT definition primarily focus on exploratory GUI testing, leaving a gap in the context of regression testing. As a secondary objective, we investigate AT's benefits, drawbacks, and usability issues when deployed with the demonstrator tool, Scout.

Method: We conducted an experiment involving 13 industry professionals, from six companies, comparing AT to manual GUI-based regression testing. These results were complemented by interviews and Bayesian data analysis (BDA) of the study's quantitative results.

Results: The results of the Bayesian data analysis revealed that the use of AT shortens test durations in 70% of the cases on average, concluding that AT is more efficient.When comparing the means of the total duration to perform all tests, AT reduced the test duration by 36% in total. Participant interviews highlighted nine benefits and eleven drawbacks of AT, while observations revealed four usability issues.

Conclusion: This study makes an empirical contribution to understanding Augmented Testing, a promising approach to improve the efficiency of GUI-based regression testing in practice. Furthermore, it underscores the importance of continual refinements of AT.

Place, publisher, year, edition, pages
Springer, 2024
Keywords
GUI-based testing, GUI testing, Augmented Testing, manual teting, Bayesian data analysis
National Category
Software Engineering
Research subject
Systems Engineering
Identifiers
urn:nbn:se:bth-25391 (URN)10.1007/s10664-024-10522-z (DOI)001292331700002 ()2-s2.0-85201391671 (Scopus ID)
Funder
Knowledge Foundation, 20180010
Available from: 2023-09-18 Created: 2023-09-18 Last updated: 2025-09-30Bibliographically approved
Fucci, D., Alégroth, E., Felderer, M. & Johannesson, C. (2024). Evaluating software security maturity using OWASP SAMM: Different approaches and stakeholders perceptions. Journal of Systems and Software, 214, Article ID 112062.
Open this publication in new window or tab >>Evaluating software security maturity using OWASP SAMM: Different approaches and stakeholders perceptions
2024 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 214, article id 112062Article in journal (Refereed) Published
Abstract [en]

Background: Recent years have seen a surge in cyber-attacks, which can be prevented or mitigated using software security activities. OWASP SAMM is a maturity model providing a versatile way for companies to assess their security posture and plan for improvements. Objective: We perform an initial SAMM assessment in collaboration with a company in the financial domain. Our objective is to assess a holistic inventory of the company security-related activities, focusing on how different roles perform the assessment and how they perceive the instrument used in the process. Methodology: We perform a case study to collect data using SAMM in a lightweight and novel manner through assessment using an online survey with 17 participants and a focus group with seven participants. Results: We show that different roles perceive maturity differently and that the two assessments deviate only for specific practices making the lightweight approach a viable and efficient solution in industrial practice. Our results indicate that the questions included in the SAMM assessment tool are answered easily and confidently across most roles. Discussion: Our results suggest that companies can productively use a lightweight SAMM assessment. We provide nine lessons learned for guiding industrial practitioners in the evaluation of their current security posture as well as for academics wanting to utilize SAMM as a research tool in industrial settings. Editor's note: Open Science material was validated by the Journal of Systems and Software Open Science Board. © 2024 The Author(s)

Place, publisher, year, edition, pages
Elsevier, 2024
Keywords
Industry-academia collaboration, OWASP SAMM, Software security, Cybersecurity, Industrial research, Petroleum reservoir evaluation, Cyber-attacks, Evaluating software, Financial domains, Maturity model, Open science, Security activities, Stakeholder perception, Network security
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-26188 (URN)10.1016/j.jss.2024.112062 (DOI)001237888500001 ()2-s2.0-85192019707 (Scopus ID)
Funder
Knowledge Foundation, 20180010
Available from: 2024-05-13 Created: 2024-05-13 Last updated: 2025-09-30Bibliographically approved
Nass, M., Alégroth, E. & Feldt, R. (2024). Improving Web Element Localization by Using a Large Language Model. Software testing, verification & reliability, 34(7)
Open this publication in new window or tab >>Improving Web Element Localization by Using a Large Language Model
2024 (English)In: Software testing, verification & reliability, ISSN 0960-0833, E-ISSN 1099-1689, Vol. 34, no 7Article in journal (Refereed) Published
Abstract [en]

Web-based test automation heavily relies on accurately finding web elements. Traditional methods compare attributes but don't grasp the context and meaning of elements and words. The emergence of Large Language Models (LLMs) like GPT-4, which can show human-like reasoning abilities on some tasks, offers new opportunities for software engineering and web element localization. This paper introduces and evaluates VON Similo LLM, an enhanced web element localization approach. Using an LLM, it selects the most likely web element from the top-ranked ones identified by the existing VON Similo method, ideally aiming to get closer to human-like selection accuracy. An experimental study was conducted using 804 web element pairs from 48 real-world web applications. We measured the number of correctly identified elements as well as the execution times, comparing the effectiveness and efficiency of VON Similo LLM against the baseline algorithm. In addition, motivations from the LLM were recorded and analyzed for all instances where the original approach failed to find the right web element. VON Similo LLM demonstrated improved performance, reducing failed localizations from 70 to 39 (out of 804), a 44 percent reduction. Despite its slower execution time and additional costs of using the GPT-4 model, the LLMs human-like reasoning showed promise in enhancing web element localization. LLM technology can enhance web element identification in GUI test automation, reducing false positives and potentially lowering maintenance costs. However, further research is necessary to fully understand LLMs capabilities, limitations, and practical use in GUI testing.

Place, publisher, year, edition, pages
John Wiley & Sons, 2024
Keywords
GUI Testing, Test Automation, Test Case Robustness, Web Element Locators, Large Language Models
National Category
Computer Systems
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-25637 (URN)10.1002/stvr.1893 (DOI)001290853000001 ()2-s2.0-85201296537 (Scopus ID)
Funder
Knowledge Foundation, 20180010
Available from: 2023-11-22 Created: 2023-11-22 Last updated: 2025-09-30Bibliographically approved
Projects
M.E.T.A. – Modelling Efficient Test Architectures [20180102]; Blekinge Institute of Technology; Publications
Alégroth, E., Petersén, E. & Tinnerholm, J. (2021). A Failed attempt at creating Guidelines for Visual GUI Testing: An industrial case study. In: Proceedings - 2021 IEEE 14th International Conference on Software Testing, Verification and Validation, ICST 2021: . Paper presented at 14th IEEE International Conference on Software Testing, Verification and Validation, ICST 2021, 12 April 2021 through 16 April 2021 (pp. 340-350). Institute of Electrical and Electronics Engineers Inc., Article ID 9438551.
T.A.R.G.E.T. – Testing with AI Reinforced GUI Embedded Technology [2024-00242_Vinnova]; Blekinge Institute of Technology; Publications
Tomic, S., Alégroth, E. & Isaac, M. (2025). Evaluation of the Choice of LLM in a Multi-Agent Solution for GUI-Test Generation. In: Fasolino A.R., Panichella S., Aleti A., Mesbah A. (Ed.), 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025: . Paper presented at 18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, March 31- April 4, 2025 (pp. 487-497). Institute of Electrical and Electronics Engineers (IEEE)Buarque Franzosi, D., Alégroth, E. & Isaac, M. (2025). LLM-Based Labelling of Recorded Automated GUI-Based Test Cases. In: Fasolino A.R., Panichella S., Aleti A., Mesbah A. (Ed.), 2025 IEEE Conference on Software Testing, Verification and Validation, ICST 2025: . Paper presented at 18th IEEE Conference on Software Testing, Verification and Validation, ICST 2025, Naples, April 31-4, 2025 (pp. 453-463). Institute of Electrical and Electronics Engineers (IEEE)Buarque Franzosi, D., Capovski, K., Isaac, M. & Byttner, S. (2025). Recommendation System of Client-Requested Projects in the Swedish Consultancy Market with LLMs. In: Nowaczyk S., Vettoruzzo A. (Ed.), CEUR Workshop Proceedings: . Paper presented at 2025 Swedish AI Society Workshop, SAIS 2025, Halmstad, June 16-17, 2025 (pp. 79-92). Technical University of Aachen, 4037
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-7526-3727

Search in DiVA

Show all publications