Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Document Image Binarization Using Recurrent Neural Networks
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0002-2161-7371
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0002-0535-1761
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0001-9947-1088
2018 (English)In: Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, IEEE, 2018, p. 263-268Conference paper, Published paper (Refereed)
Abstract [en]

In the context of document image analysis, image binarization is an important preprocessing step for other document analysis algorithms, but also relevant on its own by improving the readability of images of historical documents. While historical document image binarization is challenging due to common image degradations, such as bleedthrough, faded ink or stains, achieving good binarization performance in a timely manner is a worthwhile goal to facilitate efficient information extraction from historical documents. In this paper, we propose a recurrent neural network based algorithm using Grid Long Short-Term Memory cells for image binarization, as well as a pseudo F-Measure based weighted loss function. We evaluate the binarization and execution performance of our algorithm for different choices of footprint size, scale factor and loss function. Our experiments show a significant trade-off between binarization time and quality for different footprint sizes. However, we see no statistically significant difference when using different scale factors and only limited differences for different loss functions. Lastly, we compare the binarization performance of our approach with the best performing algorithm in the 2016 handwritten document image binarization contest and show that both algorithms perform equally well.

Place, publisher, year, edition, pages
IEEE, 2018. p. 263-268
Keywords [en]
image binarization, recurrent neural networks, Grid LSTM, historical documents, Text analysis, Labeling, Recurrent neural networks, Heuristic algorithms, Training, Degradation, Ink
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:bth-16749DOI: 10.1109/DAS.2018.71ISI: 000467070300045ISBN: 978-1-5386-3346-5 (electronic)OAI: oai:DiVA.org:bth-16749DiVA, id: diva2:1231250
Conference
2018 13th IAPR International Workshop on Document Analysis Systems (DAS), vienna
Part of project
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, Knowledge Foundation
Funder
Knowledge Foundation, 20140032Available from: 2018-07-06 Created: 2018-07-06 Last updated: 2021-07-26Bibliographically approved
In thesis
1. Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
Open this publication in new window or tab >>Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable.

In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2018. p. 135
Series
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 3
Keywords
image binarization, heterogeneous computing, recurrent neural networks, interactive machine learning, historical documents
National Category
Computer Engineering Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:bth-16797 (URN)978-91-7295-355-0 (ISBN)
Presentation
2018-09-10, J1640, Valhallavägen 1, Karlskrona, 10:15 (English)
Opponent
Supervisors
Projects
Scalable resource-efficient systems for big data analytics
Funder
Knowledge Foundation, 20140032
Available from: 2018-08-27 Created: 2018-07-12 Last updated: 2018-11-14Bibliographically approved
2. Data and Time Efficient Historical Document Analysis
Open this publication in new window or tab >>Data and Time Efficient Historical Document Analysis
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the last decades companies and government institutions have gathered vast collections of images of historical handwritten documents. In order to make these collections truly useful to the broader public, images suffering from degradations, such as faded ink, bleed through or stains, need to be made readable and the collections as a whole need to be made searchable. Readability can be achieved by separating text foreground from page background using document image binarization, while searchability by search string or by example image can be achieved through word spotting. Developing algorithms with reasonable binarization or word spotting performance is a difficult task. Additional challenges are to make these algorithms execute fast enough to process vast collections of images in a reasonable amount of time, and to enable them to learn from few labeled training samples. In this thesis, we explore heterogeneous computing, parameter prediction, and enhanced throughput as ways to reduce the execution time of document image binarization algorithms. We find that parameter prediction and mapping a heuristics based binarization algorithm to the GPU lead to an 1.7 and 3.5 increase in execution performance respectively. Furthermore, we identify for a learning based binarization algorithm using recurrent neural networks the number of pixels processed at once as way to trade off execution time with binarization quality. The achieved increase in throughput results in a 3.8 times faster overall execution time. Additionally, we explore guided machine learning (gML) as a possible approach to reduce the required amount of training data for learning based algorithms for binarization, character recognition and word spotting. We propose an initial gML system for binarization, which allows a user to improve an algorithm’s binarization quality by selecting suitable training samples. Based on this system, we identify and pursue three different directions, viz., formulation of a clear definition of gML, identification of an efficient knowledge transfer mechanism from user to learner, and automation of sample selection. We explore the Learning Using Privileged Information paradigm as a possible knowledge transfer mechanism by using character graphs as privileged information for training a neural network based character recognizer. Furthermore, we show that, given a suitable word image representation, automatic sample selection can help to reduce the amount of training data required for word spotting by up to 69%.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2020. p. 202
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 5
National Category
Computer Engineering Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Research subject
Computer Systems Engineering
Identifiers
urn:nbn:se:bth-19529 (URN)978-91-7295-404-5 (ISBN)
Public defence
2020-09-03, J1630, Valhallavägen 1, Karlskrona, 13:15 (English)
Opponent
Supervisors
Funder
Knowledge Foundation, 20140032
Available from: 2020-05-29 Created: 2020-05-29 Last updated: 2020-12-14Bibliographically approved

Open Access in DiVA

fulltext(710 kB)501 downloads
File information
File name FULLTEXT01.pdfFile size 710 kBChecksum SHA-512
a52f4bdc3bfe0dac170ccfe11838c4b1994425d1cae48097960dc9296642f395df96f481ad55d2dc1cdbd3ca36c1c1999edf40e51365ce43653d8a67c27f9ae0
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Westphal, FlorianLavesson, NiklasGrahn, Håkan

Search in DiVA

By author/editor
Westphal, FlorianLavesson, NiklasGrahn, Håkan
By organisation
Department of Computer Science and Engineering
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 501 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 770 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf