Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Cluster Validation Measures for Label Noise Filtering
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0003-3128-191x
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
TU of Sofia, BUL.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2018 (English)In: 9th International Conference on Intelligent Systems 2018: Theory, Research and Innovation in Applications, IS 2018 - Proceedings / [ed] JardimGoncalves, R; Mendonca, JP; Jotsov, V; Marques, M; Martins, J; Bierwolf, R, Institute of Electrical and Electronics Engineers Inc. , 2018, p. 109-116Conference paper, Published paper (Refereed)
Abstract [en]

Cluster validation measures are designed to find the partitioning that best fits the underlying data. In this paper, we show that these well-known and scientifically proven validation measures can also be used in a different context, i.e., for filtering mislabeled instances or class outliers prior to training in super-vised learning problems. A technique, entitled CVI-based Outlier Filtering, is proposed in which mislabeled instances are identified and eliminated from the training set, and a classification hypothesis is then built from the set of remaining instances. The proposed approach assigns each instance several cluster validation scores representing its potential of being an outlier with respect to the clustering properties the used validation measures assess. We examine CVI-based Outlier Filtering and compare it against the LOF detection method on ten data sets from the UCI data repository using five well-known learning algorithms and three different cluster validation indices. In addition, we study two approaches for filtering mislabeled instances: local and global. Our results show that for most learning algorithms and data sets, the proposed CVI-based outlier filtering algorithm outperforms the baseline method (LOF). The greatest increase in classification accuracy has been achieved by combining at least two of the used cluster validation indices and global filtering of mislabeled instances. © 2018 IEEE.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc. , 2018. p. 109-116
Keywords [en]
Class noise, Classification, Cluster validation measures, Label noise, Classification (of information), Intelligent systems, Learning algorithms, Statistics, Classification accuracy, Cluster validation, Clustering properties, Data repositories, Detection methods, Filtering algorithm, Learning problem, Clustering algorithms
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-18023DOI: 10.1109/IS.2018.8710495ISI: 000469337900017Scopus ID: 2-s2.0-85065973083ISBN: 9781538670972 (print)OAI: oai:DiVA.org:bth-18023DiVA, id: diva2:1324906
Conference
9th International Conference on Intelligent Systems, IS 2018; Funchal - Madeira; Portugal; 25 September 2018 through 27
Available from: 2019-06-14 Created: 2019-06-14 Last updated: 2019-07-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Boeva, VeselkaLundberg, Lars

Search in DiVA

By author/editor
Boeva, VeselkaLundberg, LarsKohstall, Jan
By organisation
Department of Computer Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 59 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf