Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Practical Considerations and Solutions in NLP-Based Analysis of Code Review Comments - An Experience Report
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering.ORCID iD: 0000-0003-3177-6138
2025 (English)In: Product-Focused Software Process Improvement / [ed] Dietmar Pfahl, Javier Gonzalez Huerta, Jil Klünder, Hina Anwar, Springer Science+Business Media B.V., 2025, Vol. 15452, p. 342-351Conference paper, Published paper (Refereed)
Abstract [en]

Context: Automated analysis of code review comments (CRCs) can aid in highlighting frequently discussed issues by reviewers from large repositories. Topic modeling is a promising approach to analyzing large natural language repositories. However, CRCs contain natural language text and code references; thus, data pre-processing and topic modeling approaches must be carefully selected.

Objective: This work aims to discuss the various decisions taken and considerations involved in the analysis of CRCs.

Method: We utilized 5,560 CRCs from an open-source system to study the decisions and considerations faced during the analysis of CRCs using topic modeling, followed by an evaluation of the interpretability of identified themes by a domain expert.

Results: We report several observations and challenges in improving the quality of the identified themes, including choices regarding the pre-processing, topic modeling parameters, embedding model, and objective measures of coherence used, which impact the subjective interpretability of the identified themes.

Conclusions: This work offers unique considerations, and the impact of these decisions can facilitate future studies in conducting topic modeling-based analyses of CRCs. Future studies can utilize the technical demonstrator to explore the interpretability of the topics generated from CRCs. 

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2025. Vol. 15452, p. 342-351
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15452
Keywords [en]
Natural language processing systems, Open source software, Open systems, Automated analysis, Code review, Data preprocessing, Experience report, Interpretability, Modeling approach, Natural languages, Natural languages texts, Processing model, Topic Modeling, Modeling languages
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-27330DOI: 10.1007/978-3-031-78386-9_24ISI: 001423664600024Scopus ID: 2-s2.0-85211908780ISBN: 9783031783852 (print)OAI: oai:DiVA.org:bth-27330DiVA, id: diva2:1923722
Conference
25th International Conference on Product-Focused Software Process Improvement, PROFES 2024, Tartu, Dec 2-4, 2024
Funder
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile CommunicationsAvailable from: 2024-12-30 Created: 2024-12-30 Last updated: 2025-09-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Iftikhar, Umar

Search in DiVA

By author/editor
Iftikhar, Umar
By organisation
Department of Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 72 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf