Change search
Link to record
Permanent link

Direct link
Publications (8 of 8) Show all publications
Thode, L., Iftikhar, U. & Mendez, D. (2025). Exploring the use of LLMs for the selection phase in systematic literature studies. Information and Software Technology, 184, Article ID 107757.
Open this publication in new window or tab >>Exploring the use of LLMs for the selection phase in systematic literature studies
2025 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025, Vol. 184, article id 107757Article in journal (Refereed) Published
Abstract [en]

Context: Systematic literature studies, such as secondary studies, are crucial to aggregate evidence. An essential part of these studies is the selection phase of relevant studies. This, however, is time-consuming, resource-intensive, and error-prone as it highly depends on manual labor and domain expertise. The increasing popularity of Large Language Models (LLMs) raises the question to what extent these manual study selection tasks could be supported in an automated manner.

Objectives: In this manuscript, we report on our effort to explore and evaluate the use of state-of-the-art LLMs to automate the selection phase in systematic literature studies.

Method: We evaluated LLMs for the selection phase using two published systematic literature studies in software engineering as ground truth. Three prompts were designed and applied across five LLMs to the studies’ titles and abstracts based on their inclusion and exclusion criteria. Additionally, we analyzed combining two LLMs to replicate a practical selection phase. We analyzed recall and precision and reflected upon the accuracy of the LLMs, and whether the ground truth studies were conducted by early career scholars or by more advanced ones.

Results: Our results show a high average recall of up to 98% combined with a precision of 27% in a single LLM approach and an average recall of 99% with a precision of 27% in a two-model approach replicating a two-reviewer procedure. Further the Llama 2 models showed the highest average recall 98% across all prompt templates and datasets while GPT4-turbo had the highest average precision 72%.

Conclusions: Our results demonstrate how LLMs could support a selection phase in the future. We recommend a two LLM-approach to archive a higher recall. However, we also critically reflect upon how further studies are required using other models and prompts on more datasets to strengthen the confidence in our presented approach. © 2025 The Authors

Place, publisher, year, edition, pages
Elsevier, 2025
Keywords
Automation, Large language models, Systematic literature studies
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-27884 (URN)10.1016/j.infsof.2025.107757 (DOI)001491965200001 ()2-s2.0-105004904751 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsKnowledge Foundation, 20180010Knowledge Foundation, 20220235
Available from: 2025-05-23 Created: 2025-05-23 Last updated: 2025-06-02Bibliographically approved
Iftikhar, U. (2025). Practical Considerations and Solutions in NLP-Based Analysis of Code Review Comments - An Experience Report. In: Dietmar Pfahl, Javier Gonzalez Huerta, Jil Klünder, Hina Anwar (Ed.), Product-Focused Software Process Improvement: . Paper presented at 25th International Conference on Product-Focused Software Process Improvement, PROFES 2024, Tartu, Dec 2-4, 2024 (pp. 342-351). Springer Science+Business Media B.V., 15452
Open this publication in new window or tab >>Practical Considerations and Solutions in NLP-Based Analysis of Code Review Comments - An Experience Report
2025 (English)In: Product-Focused Software Process Improvement / [ed] Dietmar Pfahl, Javier Gonzalez Huerta, Jil Klünder, Hina Anwar, Springer Science+Business Media B.V., 2025, Vol. 15452, p. 342-351Conference paper, Published paper (Refereed)
Abstract [en]

Context: Automated analysis of code review comments (CRCs) can aid in highlighting frequently discussed issues by reviewers from large repositories. Topic modeling is a promising approach to analyzing large natural language repositories. However, CRCs contain natural language text and code references; thus, data pre-processing and topic modeling approaches must be carefully selected.

Objective: This work aims to discuss the various decisions taken and considerations involved in the analysis of CRCs.

Method: We utilized 5,560 CRCs from an open-source system to study the decisions and considerations faced during the analysis of CRCs using topic modeling, followed by an evaluation of the interpretability of identified themes by a domain expert.

Results: We report several observations and challenges in improving the quality of the identified themes, including choices regarding the pre-processing, topic modeling parameters, embedding model, and objective measures of coherence used, which impact the subjective interpretability of the identified themes.

Conclusions: This work offers unique considerations, and the impact of these decisions can facilitate future studies in conducting topic modeling-based analyses of CRCs. Future studies can utilize the technical demonstrator to explore the interpretability of the topics generated from CRCs. 

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15452
Keywords
Natural language processing systems, Open source software, Open systems, Automated analysis, Code review, Data preprocessing, Experience report, Interpretability, Modeling approach, Natural languages, Natural languages texts, Processing model, Topic Modeling, Modeling languages
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-27330 (URN)10.1007/978-3-031-78386-9_24 (DOI)001423664600024 ()2-s2.0-85211908780 (Scopus ID)9783031783852 (ISBN)
Conference
25th International Conference on Product-Focused Software Process Improvement, PROFES 2024, Tartu, Dec 2-4, 2024
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2024-12-30 Created: 2024-12-30 Last updated: 2025-03-14Bibliographically approved
Iftikhar, U., Börstler, J., Ali, N. b. & Kopp, O. (2025). Supporting the identification of prevalent quality issues in code changes by analyzing reviewers’ feedback. Software quality journal, 33(2), Article ID 22.
Open this publication in new window or tab >>Supporting the identification of prevalent quality issues in code changes by analyzing reviewers’ feedback
2025 (English)In: Software quality journal, ISSN 0963-9314, E-ISSN 1573-1367, Vol. 33, no 2, article id 22Article in journal (Refereed) Published
Abstract [en]

Context: Code reviewers provide valuable feedback during the code review. Identifying common issues described in the reviewers’ feedback can provide input for devising context-specific software development improvements. However, the use of reviewer feedback for this purpose is currently less explored.

Objective: In this study, we assess how automation can derive more interpretable and informative themes in reviewers’ feedback and whether these themes help to identify recurring quality-related issues in code changes.

Method: We conducted a participatory case study using the JabRef system to analyze reviewers’ feedback on merged and abandoned code changes. We used two promising topic modeling methods (GSDMM and BERTopic) to identify themes in 5,560 code review comments. The resulting themes were analyzed and named by a domain expert from JabRef.

Results: The domain expert considered the identified themes from the two topic models to represent quality-related issues. Different quality issues are pointed out in code reviews for merged and abandoned code changes. While BERTopic provides higher objective coherence, the domain expert considered themes from short-text topic modeling more informative and easy to interpret than BERTopic-based topic modeling.

Conclusions: The identified prevalent code quality issues aim to address the maintainability-focused issues. The analysis of code review comments can enhance the current practices for JabRef by improving the guidelines for new developers and focusing discussions in the developer forums. The topic model choice impacts the interpretability of the generated themes, and a higher coherence (based on objective measures) of generated topics did not lead to improved interpretability by a domain expert. 

Place, publisher, year, edition, pages
Springer, 2025
Keywords
Modern code review, Natural language processing, Open-source systems, Software quality improvement, Computer software selection and evaluation, Open source software, Software design, Code changes, Code review, Domain experts, Language processing, Natural languages, Open source system, Software quality improvements, Topic Modeling, Software quality
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-27789 (URN)10.1007/s11219-025-09720-9 (DOI)001473057800001 ()2-s2.0-105003288015 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsKnowledge Foundation, 20220235
Available from: 2025-05-02 Created: 2025-05-02 Last updated: 2025-05-02Bibliographically approved
Iftikhar, U., Ali, N. b., Börstler, J. & Usman, M. (2024). A tertiary study on links between source code metrics and external quality attributes. Information and Software Technology, 165, Article ID 107348.
Open this publication in new window or tab >>A tertiary study on links between source code metrics and external quality attributes
2024 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025, Vol. 165, article id 107348Article, review/survey (Refereed) Published
Abstract [en]

Context: Several secondary studies have investigated the relationship between internal quality attributes, source code metrics and external quality attributes. Sometimes they have contradictory results. Objective: We synthesize evidence of the link between internal quality attributes, source code metrics and external quality attributes along with the efficacy of the prediction models used. Method: We conducted a tertiary review to identify, evaluate and synthesize secondary studies. We used several characteristics of secondary studies as indicators for the strength of evidence and considered them when synthesizing the results. Results: From 711 secondary studies, we identified 15 secondary studies that have investigated the link between source code and external quality. Our results show : (1) primarily, the focus has been on object-oriented systems, (2) maintainability and reliability are most often linked to internal quality attributes and source code metrics, with only one secondary study reporting evidence for security, (3) only a small set of complexity, coupling, and size-related source code metrics report a consistent positive link with maintainability and reliability, and (4) group method of data handling (GMDH) based prediction models have performed better than other prediction models for maintainability prediction. Conclusions: Based on our results, lines of code, coupling, complexity and the cohesion metrics from Chidamber & Kemerer (CK) metrics are good indicators of maintainability with consistent evidence from high and moderate-quality secondary studies. Similarly, four CK metrics related to coupling, complexity and cohesion are good indicators of reliability, while inheritance and certain cohesion metrics show no consistent evidence of links to maintainability and reliability. Further empirical studies are needed to explore the link between internal quality attributes, source code metrics and other external quality attributes, including functionality, portability, and usability. The results will help researchers and practitioners understand the body of knowledge on the subject and identify future research directions. © 2023 The Author(s)

Place, publisher, year, edition, pages
Elsevier, 2024
Keywords
Code quality, Evidence, Product quality, Quality models, Tertiary review, Tertiary study, Codes (symbols), Computer programming languages, Data handling, Forecasting, Object oriented programming, Reliability, External quality, Internal quality, Products quality, Quality attributes, Quality modeling, Source code metrics, Maintainability
National Category
Software Engineering
Identifiers
urn:nbn:se:bth-25555 (URN)10.1016/j.infsof.2023.107348 (DOI)001102357100001 ()2-s2.0-85174715019 (Scopus ID)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsKnowledge Foundation, 20190081
Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2024-03-13Bibliographically approved
Iftikhar, U., Börstler, J., Ali, N. b. & Kopp, O. (2024). Identifying prevalent quality issues in code changes by analyzing reviewers' feedback.
Open this publication in new window or tab >>Identifying prevalent quality issues in code changes by analyzing reviewers' feedback
2024 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Context: Code reviewers provide valuable feedback during the code review. Identifying common issues described in the reviewers' feedback can provide input for context-specific software improvement opportunities. However, the use of reviewer feedback for this purpose is currently less explored.

Objective: Assessing if and how automation can derive themes in reviewers' feedback and whether these themes help to identify recurring quality-related issues in code changes.

Method: We conducted a case study using the JabRef system to distinguish reviewers' feedback on merged and abandoned code changes for the analysis. We used topic modeling to identify themes in 5,560 code review comments. The resulting themes were analyzed and named by a domain expert from JabRef.

Results: The domain expert considered the identified themes from the proposed automation approach to represent quality-related issues. We found that different quality issues are pointed out in code reviews for merged and abandoned code changes. 

Conclusions: The results indicate the usefulness of our proposed automation approach in utilizing code review comments for understanding the prevalent code quality issues that can help derive targeted and context-bound improvement actions.

National Category
Computer Systems
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-25611 (URN)
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2024-01-23 Created: 2024-01-23 Last updated: 2024-03-13Bibliographically approved
Iftikhar, U. (2024). Towards Measuring & Improving Source Code Quality. (Licentiate dissertation). Karlskrona: Blekinge Tekniska Högskola
Open this publication in new window or tab >>Towards Measuring & Improving Source Code Quality
2024 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Context: Software quality has a multi-faceted description encompassing several quality attributes. Central to our efforts to enhance software quality is to improve the quality of the source code. Poor source code quality impacts the quality of the delivered product. Empirical studies have investigated how to improve source code quality and how to quantify the source code improvement. However, the reported evidence linking internal code structure information and quality attributes observed by users is varied and, at times, conflicting. Furthermore, there is a further need for research to improve source code quality by understanding trends in feedback from code review comments.

Objective: This thesis contributes towards improving source code quality and synthesizes metrics to measure improvement in source code quality. Hence, our objectives are 1) To synthesize evidence of links between source code metrics and external quality attributes, & identify source code metrics, and 2) To identify areas to improve source code quality by identifying recurring code quality issues using the analysis of code review comments.

Method: We conducted a tertiary study to achieve the first objective, an archival analysis and a case study to investigate the latter two objectives.

Results: To quantify source code quality improvement, we reported a comprehensive catalog of source code metrics and a small set of source code metrics consistently linked with maintainability, reliability, and security. To improve source code quality using analysis of code review comments, our explored methodology improves the state-of-the-art with interesting results.

Conclusions: The thesis provides a promising way to analyze themes in code review comments. Researchers can use the source code metrics provided to estimate these quality attributes reliably. In future work, we aim to derive a software improvement checklist based on the analysis of trends in code review comments.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2024. p. 169
Series
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 2024:02
Keywords
Source code quality, Code review analysis, Software quality improvement
National Category
Computer Systems
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-25608 (URN)978-91-7295-474-8 (ISBN)
Presentation
2024-04-12, J1630, Karlskrona, 10:15 (English)
Opponent
Supervisors
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2024-03-13 Created: 2024-03-13 Last updated: 2024-03-21Bibliographically approved
Iftikhar, U., Ali, N. b., Börstler, J. & Usman, M. (2023). A catalog of source code metrics – a tertiary study. In: Daniel Mendez, Dietmar Winkler, Johannes Kross, Stefan Biffl, Johannes Bergsmann (Ed.), Software Quality: Higher Software Quality through Zero Waste Development. Paper presented at 15th International Conference on Software Quality, SWQD 2023, Munich, Germany, May 23-25, 2023 (pp. 87-106). Springer, 472
Open this publication in new window or tab >>A catalog of source code metrics – a tertiary study
2023 (English)In: Software Quality: Higher Software Quality through Zero Waste Development / [ed] Daniel Mendez, Dietmar Winkler, Johannes Kross, Stefan Biffl, Johannes Bergsmann, Springer, 2023, Vol. 472, p. 87-106Conference paper, Published paper (Refereed)
Abstract [en]

Context: A large number of source code metrics are reported in the literature. It is necessary to systematically collect, describe and classify source code metrics to support research and practice.Objective: We aim to utilize existing secondary studies to develop a cat- alog of source code metrics together with their descriptions. The catalog will also provide information about which units of code (e.g., operators, operands, lines of code, variables, parameters, code blocks, or functions) are used to measure the internal quality attributes and the scope on which they are collected. 

Method: We conducted a tertiary study to identify secondary studies re- porting source code metrics. We have classified the source code metrics according to the measured internal quality attributes, the units of code used in the measures, and the scope at which the source code metrics are collected. 

Results: From 711 secondary studies, we identified 52 relevant secondary studies. We reported 423 source code metrics together with their de- scriptions and the internal quality attributes they measure. Source code metrics predominantly incorporate function as a unit of code to measure internal quality attributes. In contrast, several source code metrics use more than one unit of code when measuring internal quality attributes. Nearly 51% of the source code metrics are collected at the class scope, while almost 12% and 15% of source code metrics are collected at module and application levels, respectively. 

Conclusions: Researchers and practitioners can use the extensive catalog to assess which source code metrics meet their individual needs based on the description and classification scheme presented. 

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Business Information Processing, ISSN 1865-1348, E-ISSN 1865-1356 ; 472
Keywords
Internal quality attributes, Code measurement, Code quality, Ter- tiary study, Source code metrics
National Category
Software Engineering
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-24650 (URN)10.1007/978-3-031-31488-9_5 (DOI)001269092500005 ()2-s2.0-85161231906 (Scopus ID)978-3-031-31487-2 (ISBN)978-3-031-31488-9 (ISBN)
Conference
15th International Conference on Software Quality, SWQD 2023, Munich, Germany, May 23-25, 2023
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications, B07Knowledge Foundation, 20190081
Available from: 2023-05-30 Created: 2023-05-30 Last updated: 2024-09-11Bibliographically approved
Iftikhar, U., Börstler, J. & Ali, N. b. (2023). On potential improvements in the analysis of the evolution of themes in code review comments. In: Proceedings - 2023 49th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2023: . Paper presented at 49th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2023, Durres, Sept. 6th – Sept. 8th, 2023 (pp. 340-347). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>On potential improvements in the analysis of the evolution of themes in code review comments
2023 (English)In: Proceedings - 2023 49th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2023, Institute of Electrical and Electronics Engineers (IEEE), 2023, p. 340-347Conference paper, Published paper (Refereed)
Abstract [en]

Context: The modern code review process is considered an essential quality assurance step in software development. The code review comments generated can provide insights regarding source code quality and development practices. However, the large number of code review comments makes it challenging to identify interesting patterns manually. In a recent study, Wen et al. used traditional topic modeling to analyze the evolution of code review comments. Their approach could identify interesting patterns that may lead to improved development practices.Objective: In this study, we investigate potential improvements to Wen et al.'s state-of-the-art approach to analyze the evolution of code review comments.Method: We used 209,166 code review comments from three open-source systems to explore and empirically analyze alternative design and implementation choices and demonstrate their impact.Results: We identified the following potential improvements to the current state-of-the-art as described by Wen et al.: 1) utilize a topic modeling method that is optimized for short texts, 2) a refined approach for identifying a suitable number of topics, and 3) a more elaborate approach for analyzing topic evolution. Our results indicate that the proposed changes have quantitatively different results than the current approach. The qualitative interpretation of the topics generated with our changes indicates their usefulness.Conclusions: Our results indicate the potential usefulness of changes to state-of-the-art approaches to analyzing the evolution of code review comments, with practical implications for researchers and practitioners. However, further research is required to compare the effectiveness of both approaches. © 2023 IEEE.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Modern Code Reviews, Source code quality, NLP
National Category
Computer Systems Software Engineering
Research subject
Software Engineering
Identifiers
urn:nbn:se:bth-25598 (URN)10.1109/SEAA60479.2023.00059 (DOI)2-s2.0-85183313412 (Scopus ID)9798350342352 (ISBN)
Conference
49th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2023, Durres, Sept. 6th – Sept. 8th, 2023
Projects
ELLIIT, the Strategic Research Area within IT and Mobile Communications,
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2023-11-12 Created: 2023-11-12 Last updated: 2024-12-30Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3177-6138

Search in DiVA

Show all publications