Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards benchmarking feature subset selection methods for software fault prediction
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för programvaruteknik.
2016 (Engelska)Ingår i: Studies in Computational Intelligence, Springer, 2016, 617, Vol. 617, s. 33-58Kapitel i bok, del av antologi (Refereegranskat)
Resurstyp
Text
Abstract [en]

Despite the general acceptance that software engineering datasets often contain noisy, irrelevant or redundant variables, very few benchmark studies of feature subset selection (FSS) methods on real-life data from software projects have been conducted. This paper provides an empirical comparison of state-of-the-art FSS methods: information gain attribute ranking (IG); Relief (RLF); principal component analysis (PCA); correlation-based feature selection (CFS); consistencybased subset evaluation (CNS); wrapper subset evaluation (WRP); and an evolutionary computation method, genetic programming (GP), on five fault prediction datasets from the PROMISE data repository. For all the datasets, the area under the receiver operating characteristic curve—the AUC value averaged over 10-fold cross-validation runs—was calculated for each FSS method-dataset combination before and after FSS. Two diverse learning algorithms, C4.5 and naïve Bayes (NB) are used to test the attribute sets given by each FSS method. The results show that although there are no statistically significant differences between the AUC values for the different FSS methods for both C4.5 and NB, a smaller set of FSS methods (IG, RLF, GP) consistently select fewer attributes without degrading classification accuracy. We conclude that in general, FSS is beneficial as it helps improve classification accuracy of NB and C4.5. There is no single best FSS method for all datasets but IG, RLF and GP consistently select fewer attributes without degrading classification accuracy within statistically significant boundaries. © Springer International Publishing Switzerland 2016.

Ort, förlag, år, upplaga, sidor
Springer, 2016, 617. Vol. 617, s. 33-58
Serie
Studies in Computational Intelligence, ISSN 1860-949X ; 617
Nyckelord [en]
Empirical; Fault prediction; Feature subset selection
Nationell ämneskategori
Programvaruteknik
Identifikatorer
URN: urn:nbn:se:bth-11573DOI: 10.1007/978-3-319-25964-2_3Scopus ID: 2-s2.0-84955278082OAI: oai:DiVA.org:bth-11573DiVA, id: diva2:900079
Tillgänglig från: 2016-02-03 Skapad: 2016-02-03 Senast uppdaterad: 2018-01-10Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Torkar, Richard

Sök vidare i DiVA

Av författaren/redaktören
Torkar, Richard
Av organisationen
Institutionen för programvaruteknik
Programvaruteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 972 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf