Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
User Feedback and Uncertainty in Interactive Binarization
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.ORCID-id: 0000-0002-2161-7371
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.ORCID-id: 0000-0001-9947-1088
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.ORCID-id: 0000-0002-0535-1761
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

A major challenge in document image binarization is the large variety in appearance of images from different document collections. This is especially challenging for parameterless, machine learning based binarization algorithms, which require additional ground truth training data to generalize or fine-tune to a new image collection. Reducing this costly labeling effort is relevant to companies and government institutions, which possess many different document image collections. One approach to address this problem is interactive machine learning, which enables a user to guide the fine-tuning process by providing feedback on the produced binarization result.

In this paper, we evaluate the claim that user guided training requires less labeled samples to fine-tune a basic model for binarization to a new image collection. Further, we propose a way to guide user feedback by visualizing the model’s labeling uncertainty and analyze the relationship between model uncertainty and binarization quality. Our experiments show that user feedback biases the model towards favoring foreground labels, which results in less erased text and thus better readability than when training samples are chosen randomly. Additionally, we find that model uncertainty serves as a useful guide for users and explain how the Dunning-Kruger effect prevents model uncertainty from being useful for automated sample selection.

Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:bth-16877OAI: oai:DiVA.org:bth-16877DiVA, id: diva2:1239917
Tillgänglig från: 2018-08-20 Skapad: 2018-08-20 Senast uppdaterad: 2018-08-27Bibliografiskt granskad
Ingår i avhandling
1. Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
Öppna denna publikation i ny flik eller fönster >>Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
2018 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable.

In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback.

Ort, förlag, år, upplaga, sidor
Karlskrona: Blekinge Tekniska Högskola, 2018. s. 135
Serie
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 3
Nyckelord
image binarization, heterogeneous computing, recurrent neural networks, interactive machine learning, historical documents
Nationell ämneskategori
Datorteknik Datavetenskap (datalogi) Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:bth-16797 (URN)978-91-7295-355-0 (ISBN)
Presentation
2018-09-10, J1640, Valhallavägen 1, Karlskrona, 10:15 (Engelska)
Opponent
Handledare
Projekt
Scalable resource-efficient systems for big data analytics
Forskningsfinansiär
KK-stiftelsen, 20140032
Tillgänglig från: 2018-08-27 Skapad: 2018-07-12 Senast uppdaterad: 2018-11-14Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Personposter BETA

Westphal, FlorianGrahn, HåkanLavesson, Niklas

Sök vidare i DiVA

Av författaren/redaktören
Westphal, FlorianGrahn, HåkanLavesson, Niklas
Av organisationen
Institutionen för datalogi och datorsystemteknik
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 39 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf