Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
An empirical study on the effectiveness of data resampling approaches for cross‐project software defect prediction
Wageningen Univ & Res, NLD.ORCID-id: 0000-0001-9140-9271
Massey University, NZL.ORCID-id: 0000-0001-9454-1366
University of Otago, NZL.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik. Blekinge Institute of Technology Karlskrona Sweden.ORCID-id: 0000-0003-0639-4234
2022 (Engelska)Ingår i: IET Software, ISSN 1751-8806, E-ISSN 1751-8814, Vol. 16, nr 2, s. 185-199Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Cross‐project defect prediction (CPDP), where data from different software projects are used to predict defects, has been proposed as a way to provide data for software projects that lack historical data. Evaluations of CPDP models using the Nearest Neighbour (NN)Filter approach have shown promising results in recent studies. A key challenge with defect‐prediction datasets is class imbalance, that is, highly skewed datasets where nonbuggy modules dominate the buggy modules. In the past, data resampling approaches have been applied to within‐projects defect prediction models to help alleviate the negative effects of class imbalance in the datasets. To address the class imbalance issue in CPDP, the authors assess the impact of data resampling approaches on CPDP models after the NN Filter is applied. The impact on prediction performance of five oversampling approaches (MAHAKIL, SMOTE, Borderline‐SMOTE, Random Oversamplingand ADASYN) and three undersampling approaches (Random Undersampling, Tomek Links and One‐sided selection) is investigated and results are compared to approaches without data resampling. The authors examined six defect prediction models on34 datasets extracted from the PROMISE repository. The authors' results show that there is a significant positive effect of data resampling on CPDP performance, suggesting that software quality teams and researchers should consider applying data resampling approaches for improved recall (pd) and g‐measure prediction performance. However, if the goal is to improve precision and reduce false alarm (pf) then data resampling approaches should be avoided.

Ort, förlag, år, upplaga, sidor
John Wiley & Sons, 2022. Vol. 16, nr 2, s. 185-199
Nyckelord [en]
Defect prediction, software metrics, software quality
Nationell ämneskategori
Programvaruteknik
Forskningsämne
Programvaruteknik; Datavetenskap
Identifikatorer
URN: urn:nbn:se:bth-22433DOI: 10.1049/sfw2.12052ISI: 000723085500001Scopus ID: 2-s2.0-85126754816OAI: oai:DiVA.org:bth-22433DiVA, id: diva2:1617148
Forskningsfinansiär
ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Anmärkning

open access

Tillgänglig från: 2021-12-06 Skapad: 2021-12-06 Senast uppdaterad: 2022-04-08Bibliografiskt granskad

Open Access i DiVA

fulltext(1315 kB)244 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1315 kBChecksumma SHA-512
03d9c661de942578f511fee1360a4bbab663601dfd5e70cc43525fe88a88ed7fe71078032eef960b787ba5c675d4ed129abe04fe63b630ec3bb7908f4c6a4498
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Börstler, Jürgen

Sök vidare i DiVA

Av författaren/redaktören
Bennin, Kwabena EboTahir, AmjedBörstler, Jürgen
Av organisationen
Institutionen för programvaruteknik
I samma tidskrift
IET Software
Programvaruteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 244 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 266 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf