A Framework for Evaluating GenAI Adoption and Use in Software Engineering
2026 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520Article in journal (Refereed) Epub ahead of print
Abstract [en]
Generative Artificial Intelligence (GenAI) is increasingly integrated into software products to enable new features and user capabilities, from early exploration to operational deployment. GenAI adoption as a component within a software system introduces quality risks because GenAI outputs are probabilistic, prompt-sensitive, and may drift after release. Organizations, therefore, need to decide what to evaluate, when to evaluate, and who owns quality evaluation activities across software design, development, and operations. ISO/IEC 25059 standard distinguishes between software product quality (e.g., usability) and quality-in-use (e.g., satisfaction) for AI-enabled software, yet it provides limited operational guidance for these evaluation activities. We therefore investigate how industrial software teams adopt and use GenAI models in the software systems they build and operate, and how they evaluate system qualities when deciding to adopt GenAI during development and after deployment. We do not benchmark the underlying GenAI model itself. In this study, we conducted 19 semi-structured interviews in two software development companies. We triangulated the interviews with archival data (15 internal documents and 184 internal wiki/web pages) to capture GenAI adoption steps, quality concerns, evaluation practices, and role responsibilities. Our findings describe a three-phase adoption process – Ideation, Development, and Operation – highlighting where quality evaluations occur, which criteria are used, and how evaluation responsibilities are distributed. Based on observed practices and using ISO/IEC 25059 as an organizing lens, we synthesize a process-oriented quality evaluation framework. This framework maps metrics to explicit gatekeeping, validation, and monitoring checkpoints, bridging abstract ISO quality characteristics with engineering workflows. We applied the framework in a GenAI-enabled software product (SE4AI) use case and reported how it supported structured evaluation activities. We also observed that quality evaluations span legal, security, development, QA, and operations, but ownership is fragmented across phases. We therefore propose a GenAI Quality Lead responsibility (often assignable to an existing senior role) to coordinate criteria, evidence, and traceability across quality evaluation activities. The results contribute to Software Engineering for AI (SE4AI) by clarifying how teams can measure qualities when building software that adopts and uses GenAI.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2026.
Keywords [en]
AI4SE, Empirical Study, GenAI, Generative Artificial Intelligence, SE4AI, Software Engineering, Artificial intelligence, Computer software selection and evaluation, ISO Standards, Software design, Software quality, Development and operations, Empirical studies, Evaluation activity, Quality evaluation, Software products, Software-systems, Quality control
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-29499DOI: 10.1109/TSE.2026.3688745Scopus ID: 2-s2.0-105037791380OAI: oai:DiVA.org:bth-29499DiVA, id: diva2:2060534
Part of project
SERT- Software Engineering ReThought, Knowledge Foundation
Funder
Knowledge Foundation, 201800102026-05-182026-05-182026-05-18Bibliographically approved