Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Cluster-based Sample Selection for Document Image Binarization
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datavetenskap.
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datavetenskap.ORCID-id: 0000-0002-2161-7371
2019 (Engelska)Ingår i: 2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, IEEE , 2019, s. 47-52Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The current state-of-the-art, in terms of performance, for solving document image binarization is training artificial neural networks on pre-labelled ground truth data. As such, it faces the same issues as other, more conventional, classification problems; requiring a large amount of training data. However, unlike those conventional classification problems, document image binarization involves having to either manually craft or estimate the binarized ground truth data, which can be error-prone and time-consuming. This is where sample selection, the act of selecting training samples based on some method or metric, might help. By reducing the size of the training dataset in such a way that the binarization performance is not impacted, the required time spent creating the ground truth is also reduced. This paper proposes a cluster-based sample selection method that uses image similarity metrics and the relative neighbourhood graph to reduce the underlying redundancy of the dataset. The method, implemented with affinity propagation and the structural similarity index, reduces the training dataset on average by 49.57% while reducing the binarization performance only by 0.55%.

Ort, förlag, år, upplaga, sidor
IEEE , 2019. s. 47-52
Serie
Proceedings of the International Conference on Document Analysis and Recognition, ISSN 1520-5363
Nyckelord [en]
document image binarization, sample selection, neural networks, computer vision
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:bth-19355DOI: 10.1109/ICDARW.2019.40080ISI: 000518786800009ISBN: 978-1-7281-5054-3 (tryckt)OAI: oai:DiVA.org:bth-19355DiVA, id: diva2:1421150
Konferens
15th IAPR International Conference on Document Analysis and Recognition (ICDAR) / 2nd Workshop of Machine Learning (WML), SEP 21-22, 2019, Sydney, AUSTRALIA
Ingår i projekt
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, KK-stiftelsen
Forskningsfinansiär
KK-stiftelsen, 20140032
Anmärkning

open access

Tillgänglig från: 2020-04-02 Skapad: 2020-04-02 Senast uppdaterad: 2025-09-30Bibliografiskt granskad

Open Access i DiVA

fulltext(1050 kB)393 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 1050 kBChecksumma SHA-512
3757c88587b5c29d422334ec0fffead2d0c05ae6fd35dbdcc4bb6803f0aa17b2ca3c26307dc638d55df75e71a80ab16fa4478b58e4cb79ceeb7d87f7f97abf4b
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltext

Person

Westphal, Florian

Sök vidare i DiVA

Av författaren/redaktören
Westphal, Florian
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 394 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 226 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf