Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Resampling Methods in Software Quality Classification
Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation.
2012 (Engelska)Ingår i: International Journal of Software Engineering and Knowledge Engineering, ISSN 0218-1940, Vol. 22, nr 2, s. 203-223Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

In the presence of a number of algorithms for classification and prediction in software engineering, there is a need to have a systematic way of assessing their performances. The performance assessment is typically done by some form of partitioning or resampling of the original data to alleviate biased estimation. For predictive and classification studies in software engineering, there is a lack of a definitive advice on the most appropriate resampling method to use. This is seen as one of the contributing factors for not being able to draw general conclusions on what modeling technique or set of predictor variables are the most appropriate. Furthermore, the use of a variety of resampling methods make it impossible to perform any formal meta-analysis of the primary study results. Therefore, it is desirable to examine the influence of various resampling methods and to quantify possible differences. Objective and method: This study empirically compares five common resampling methods (hold-out validation, repeated random sub-sampling, 10-fold cross-validation, leave-one-out cross-validation and non-parametric bootstrapping) using 8 publicly available data sets with genetic programming (GP) and multiple linear regression (MLR) as software quality classification approaches. Location of (PF, PD) pairs in the ROC (receiver operating characteristics) space and area under an ROC curve (AUC) are used as accuracy indicators. Results: The results show that in terms of the location of (PF, PD) pairs in the ROC space, bootstrapping results are in the preferred region for 3 of the 8 data sets for GP and for 4 of the 8 data sets for MLR. Based on the AUC measure, there are no significant differences between the different resampling methods using GP and MLR. Conclusion: There can be certain data set properties responsible for insignificant differences between the resampling methods based on AUC. These include imbalanced data sets, insignificant predictor variables and high-dimensional data sets. With the current selection of data sets and classification techniques, bootstrapping is a preferred method based on the location of (PF, PD) pair data in the ROC space. Hold-out validation is not a good choice for comparatively smaller data sets, where leave-one-out cross-validation (LOOCV) performs better. For comparatively larger data sets, 10-fold cross-validation performs better than LOOCV.

Ort, förlag, år, upplaga, sidor
World Scientific , 2012. Vol. 22, nr 2, s. 203-223
Nyckelord [en]
Resampling methods, genetic programming, multiple regression, prediction, classification
Nationell ämneskategori
Programvaruteknik
Identifikatorer
URN: urn:nbn:se:bth-7147DOI: 10.1142/S0218194012400037ISI: 000304829200004Lokalt ID: oai:bth.se:forskinfoA51C5C2E9C250832C1257AC500372020OAI: oai:DiVA.org:bth-7147DiVA, id: diva2:834729
Tillgänglig från: 2012-11-29 Skapad: 2012-11-29 Senast uppdaterad: 2018-01-11Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Personposter BETA

Afzal, WasifTorkar, RichardFeldt, Robert

Sök vidare i DiVA

Av författaren/redaktören
Afzal, WasifTorkar, RichardFeldt, Robert
Av organisationen
Sektionen för datavetenskap och kommunikation
Programvaruteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 898 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf