Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Cluster Validation Measures for Label Noise Filtering
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.ORCID-id: 0000-0003-3128-191x
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.
TU of Sofia, BUL.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.
2018 (engelsk)Inngår i: 9th International Conference on Intelligent Systems 2018: Theory, Research and Innovation in Applications, IS 2018 - Proceedings / [ed] JardimGoncalves, R; Mendonca, JP; Jotsov, V; Marques, M; Martins, J; Bierwolf, R, Institute of Electrical and Electronics Engineers Inc. , 2018, s. 109-116Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Cluster validation measures are designed to find the partitioning that best fits the underlying data. In this paper, we show that these well-known and scientifically proven validation measures can also be used in a different context, i.e., for filtering mislabeled instances or class outliers prior to training in super-vised learning problems. A technique, entitled CVI-based Outlier Filtering, is proposed in which mislabeled instances are identified and eliminated from the training set, and a classification hypothesis is then built from the set of remaining instances. The proposed approach assigns each instance several cluster validation scores representing its potential of being an outlier with respect to the clustering properties the used validation measures assess. We examine CVI-based Outlier Filtering and compare it against the LOF detection method on ten data sets from the UCI data repository using five well-known learning algorithms and three different cluster validation indices. In addition, we study two approaches for filtering mislabeled instances: local and global. Our results show that for most learning algorithms and data sets, the proposed CVI-based outlier filtering algorithm outperforms the baseline method (LOF). The greatest increase in classification accuracy has been achieved by combining at least two of the used cluster validation indices and global filtering of mislabeled instances. © 2018 IEEE.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers Inc. , 2018. s. 109-116
Emneord [en]
Class noise, Classification, Cluster validation measures, Label noise, Classification (of information), Intelligent systems, Learning algorithms, Statistics, Classification accuracy, Cluster validation, Clustering properties, Data repositories, Detection methods, Filtering algorithm, Learning problem, Clustering algorithms
HSV kategori
Identifikatorer
URN: urn:nbn:se:bth-18023DOI: 10.1109/IS.2018.8710495ISI: 000469337900017Scopus ID: 2-s2.0-85065973083ISBN: 9781538670972 (tryckt)OAI: oai:DiVA.org:bth-18023DiVA, id: diva2:1324906
Konferanse
9th International Conference on Intelligent Systems, IS 2018; Funchal - Madeira; Portugal; 25 September 2018 through 27
Tilgjengelig fra: 2019-06-14 Laget: 2019-06-14 Sist oppdatert: 2019-07-01bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Boeva, VeselkaLundberg, Lars

Søk i DiVA

Av forfatter/redaktør
Boeva, VeselkaLundberg, LarsKohstall, Jan
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 59 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf