Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Source code expert identification: Models and application
Federal University of Piauí, Brazil.
Federal University of Piauí, Brazil.
Federal University of Piauí, Brazil.
Federal University of Minas Gerais, Brazil.
Show others and affiliations
2024 (English)In: Information and Software Technology, ISSN 0950-5849, E-ISSN 1873-6025, Vol. 170, article id 107445Article, review/survey (Refereed) Published
Abstract [en]

Context: Identifying source code expertise is useful in several situations. Activities like bug fixing and helping newcomers are best performed by knowledgeable developers. Some studies have proposed repository-mining techniques to identify source code experts. However, there is a gap in understanding which variables are most related to code knowledge and how they can be used for identifying expertise. Objective: This study explores models of expertise identification and how these models can be used to improve a Truck Factor algorithm. Methods: First, we built an oracle with the knowledge of developers from software projects. Then, we use this oracle to analyze the correlation between measures from the development history and source code knowledge. We investigate the use of linear and machine-learning models to identify file experts. Finally, we use the proposed models to improve a Truck Factor algorithm and analyze their performance using data from public and private repositories. Results: First Authorship and Recency of Modification have the highest positive and negative correlations with source code knowledge, respectively. Machine learning classifiers outperformed the linear techniques (F-Score = 71% to 73%) in the largest analyzed dataset, but this advantage is unclear in the smallest one. The Truck Factor algorithm using the proposed models could handle developers missed by the previous expertise model with the best average F-Score of 74%. It was perceived as more accurate in computing the Truck Factor of an industrial project. Conclusion: If we analyze F-Score, the studied models have similar performance. However, machine learning classifiers get higher Precision while linear models obtained the highest Recall. Therefore, choosing the best technique depends on the user's tolerance to false positives and negatives. Additionally, the proposed models significantly improved the accuracy of a Truck Factor algorithm, affirming their effectiveness in precisely identifying the key developers within software projects. © 2024 Elsevier B.V.

Place, publisher, year, edition, pages
Elsevier, 2024. Vol. 170, article id 107445
Keywords [en]
Machine learning, Source code expertise, Truck Factor, Classification (of information), Codes (symbols), Computer programming languages, Trucks, Expert identifications, F-score, IMPROVE-A, Learning classifiers, Machine-learning, Performance, Software project, Source codes
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-26073DOI: 10.1016/j.infsof.2024.107445Scopus ID: 2-s2.0-85188638209OAI: oai:DiVA.org:bth-26073DiVA, id: diva2:1849080
Available from: 2024-04-05 Created: 2024-04-05 Last updated: 2024-04-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Britto, Ricardo

Search in DiVA

By author/editor
Britto, Ricardo
By organisation
Department of Software Engineering
In the same journal
Information and Software Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 159 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf