Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
User Feedback and Uncertainty in Interactive Binarization
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0002-2161-7371
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0001-9947-1088
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0002-0535-1761
(English)Manuscript (preprint) (Other academic)
Abstract [en]

A major challenge in document image binarization is the large variety in appearance of images from different document collections. This is especially challenging for parameterless, machine learning based binarization algorithms, which require additional ground truth training data to generalize or fine-tune to a new image collection. Reducing this costly labeling effort is relevant to companies and government institutions, which possess many different document image collections. One approach to address this problem is interactive machine learning, which enables a user to guide the fine-tuning process by providing feedback on the produced binarization result.

In this paper, we evaluate the claim that user guided training requires less labeled samples to fine-tune a basic model for binarization to a new image collection. Further, we propose a way to guide user feedback by visualizing the model’s labeling uncertainty and analyze the relationship between model uncertainty and binarization quality. Our experiments show that user feedback biases the model towards favoring foreground labels, which results in less erased text and thus better readability than when training samples are chosen randomly. Additionally, we find that model uncertainty serves as a useful guide for users and explain how the Dunning-Kruger effect prevents model uncertainty from being useful for automated sample selection.

National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-16877OAI: oai:DiVA.org:bth-16877DiVA, id: diva2:1239917
Available from: 2018-08-20 Created: 2018-08-20 Last updated: 2018-08-27Bibliographically approved
In thesis
1. Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
Open this publication in new window or tab >>Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable.

In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2018. p. 135
Series
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 3
Keywords
image binarization, heterogeneous computing, recurrent neural networks, interactive machine learning, historical documents
National Category
Computer Engineering Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:bth-16797 (URN)978-91-7295-355-0 (ISBN)
Presentation
2018-09-10, J1640, Valhallavägen 1, Karlskrona, 10:15 (English)
Opponent
Supervisors
Projects
Scalable resource-efficient systems for big data analytics
Funder
Knowledge Foundation, 20140032
Available from: 2018-08-27 Created: 2018-07-12 Last updated: 2018-11-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records BETA

Westphal, FlorianGrahn, HåkanLavesson, Niklas

Search in DiVA

By author/editor
Westphal, FlorianGrahn, HåkanLavesson, Niklas
By organisation
Department of Computer Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf