Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0002-2161-7371
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable.

In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2018. , p. 135
Series
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 3
Keywords [en]
image binarization, heterogeneous computing, recurrent neural networks, interactive machine learning, historical documents
National Category
Computer Engineering Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:bth-16797ISBN: 978-91-7295-355-0 (print)OAI: oai:DiVA.org:bth-16797DiVA, id: diva2:1232594
Presentation
2018-09-10, J1640, Valhallavägen 1, Karlskrona, 10:15 (English)
Opponent
Supervisors
Projects
Scalable resource-efficient systems for big data analytics
Funder
Knowledge Foundation, 20140032Available from: 2018-08-27 Created: 2018-07-12 Last updated: 2018-11-14Bibliographically approved
List of papers
1. Efficient document image binarization using heterogeneous computing and parameter tuning
Open this publication in new window or tab >>Efficient document image binarization using heterogeneous computing and parameter tuning
2018 (English)In: International Journal on Document Analysis and Recognition, ISSN 1433-2833, E-ISSN 1433-2825, Vol. 21, no 1-2, p. 41-58Article in journal (Refereed) Published
Abstract [en]

In the context of historical document analysis, image binarization is a first important step, which separates foreground from background, despite common image degradations, such as faded ink, stains, or bleed-through. Fast binarization has great significance when analyzing vast archives of document images, since even small inefficiencies can quickly accumulate to years of wasted execution time. Therefore, efficient binarization is especially relevant to companies and government institutions, who want to analyze their large collections of document images. The main challenge with this is to speed up the execution performance without affecting the binarization performance. We modify a state-of-the-art binarization algorithm and achieve on average a 3.5 times faster execution performance by correctly mapping this algorithm to a heterogeneous platform, consisting of a CPU and a GPU. Our proposed parameter tuning algorithm additionally improves the execution time for parameter tuning by a factor of 1.7, compared to previous parameter tuning algorithms. We see that for the chosen algorithm, machine learning-based parameter tuning improves the execution performance more than heterogeneous computing, when comparing absolute execution times. © 2018 The Author(s)

Place, publisher, year, edition, pages
Springer Verlag, 2018
Keywords
Automatic parameter tuning, Heterogeneous computing, Historical documents, Image binarization, Bins, History, Image analysis, Learning systems, Document image binarization, Government institutions, Heterogeneous platforms, Parameter tuning algorithm, Parameter estimation
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15891 (URN)10.1007/s10032-017-0293-7 (DOI)000433193500003 ()2-s2.0-85041228615 (Scopus ID)
Available from: 2018-02-15 Created: 2018-02-15 Last updated: 2021-07-26Bibliographically approved
2. Document Image Binarization Using Recurrent Neural Networks
Open this publication in new window or tab >>Document Image Binarization Using Recurrent Neural Networks
2018 (English)In: Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, IEEE, 2018, p. 263-268Conference paper, Published paper (Refereed)
Abstract [en]

In the context of document image analysis, image binarization is an important preprocessing step for other document analysis algorithms, but also relevant on its own by improving the readability of images of historical documents. While historical document image binarization is challenging due to common image degradations, such as bleedthrough, faded ink or stains, achieving good binarization performance in a timely manner is a worthwhile goal to facilitate efficient information extraction from historical documents. In this paper, we propose a recurrent neural network based algorithm using Grid Long Short-Term Memory cells for image binarization, as well as a pseudo F-Measure based weighted loss function. We evaluate the binarization and execution performance of our algorithm for different choices of footprint size, scale factor and loss function. Our experiments show a significant trade-off between binarization time and quality for different footprint sizes. However, we see no statistically significant difference when using different scale factors and only limited differences for different loss functions. Lastly, we compare the binarization performance of our approach with the best performing algorithm in the 2016 handwritten document image binarization contest and show that both algorithms perform equally well.

Place, publisher, year, edition, pages
IEEE, 2018
Keywords
image binarization, recurrent neural networks, Grid LSTM, historical documents, Text analysis, Labeling, Recurrent neural networks, Heuristic algorithms, Training, Degradation, Ink
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:bth-16749 (URN)10.1109/DAS.2018.71 (DOI)000467070300045 ()978-1-5386-3346-5 (ISBN)
Conference
2018 13th IAPR International Workshop on Document Analysis Systems (DAS), vienna
Funder
Knowledge Foundation, 20140032
Available from: 2018-07-06 Created: 2018-07-06 Last updated: 2021-07-26Bibliographically approved
3. User Feedback and Uncertainty in Interactive Binarization
Open this publication in new window or tab >>User Feedback and Uncertainty in Interactive Binarization
(English)Manuscript (preprint) (Other academic)
Abstract [en]

A major challenge in document image binarization is the large variety in appearance of images from different document collections. This is especially challenging for parameterless, machine learning based binarization algorithms, which require additional ground truth training data to generalize or fine-tune to a new image collection. Reducing this costly labeling effort is relevant to companies and government institutions, which possess many different document image collections. One approach to address this problem is interactive machine learning, which enables a user to guide the fine-tuning process by providing feedback on the produced binarization result.

In this paper, we evaluate the claim that user guided training requires less labeled samples to fine-tune a basic model for binarization to a new image collection. Further, we propose a way to guide user feedback by visualizing the model’s labeling uncertainty and analyze the relationship between model uncertainty and binarization quality. Our experiments show that user feedback biases the model towards favoring foreground labels, which results in less erased text and thus better readability than when training samples are chosen randomly. Additionally, we find that model uncertainty serves as a useful guide for users and explain how the Dunning-Kruger effect prevents model uncertainty from being useful for automated sample selection.

National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-16877 (URN)
Available from: 2018-08-20 Created: 2018-08-20 Last updated: 2018-08-27Bibliographically approved

Open Access in DiVA

fulltext(13805 kB)478 downloads
File information
File name FULLTEXT01.pdfFile size 13805 kBChecksum SHA-512
c3a45d73e131b969380ea0b79bd4122451e27a345091ee9f6a7076c9c79e00c8679570f22cb985d22c52f5de1a9ca2507aa4607c300241b3587ee440673b181c
Type fulltextMimetype application/pdf

Authority records

Westphal, Florian

Search in DiVA

By author/editor
Westphal, Florian
By organisation
Department of Computer Science and Engineering
Computer EngineeringComputer SciencesComputer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 478 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 451 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf