Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Document Image Binarization Using Recurrent Neural Networks
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.ORCID-id: 0000-0002-2161-7371
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.ORCID-id: 0000-0002-0535-1761
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.ORCID-id: 0000-0001-9947-1088
2018 (Engelska)Ingår i: Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, IEEE, 2018, s. 263-268Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In the context of document image analysis, image binarization is an important preprocessing step for other document analysis algorithms, but also relevant on its own by improving the readability of images of historical documents. While historical document image binarization is challenging due to common image degradations, such as bleedthrough, faded ink or stains, achieving good binarization performance in a timely manner is a worthwhile goal to facilitate efficient information extraction from historical documents. In this paper, we propose a recurrent neural network based algorithm using Grid Long Short-Term Memory cells for image binarization, as well as a pseudo F-Measure based weighted loss function. We evaluate the binarization and execution performance of our algorithm for different choices of footprint size, scale factor and loss function. Our experiments show a significant trade-off between binarization time and quality for different footprint sizes. However, we see no statistically significant difference when using different scale factors and only limited differences for different loss functions. Lastly, we compare the binarization performance of our approach with the best performing algorithm in the 2016 handwritten document image binarization contest and show that both algorithms perform equally well.

Ort, förlag, år, upplaga, sidor
IEEE, 2018. s. 263-268
Nyckelord [en]
image binarization, recurrent neural networks, Grid LSTM, historical documents, Text analysis, Labeling, Recurrent neural networks, Heuristic algorithms, Training, Degradation, Ink
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
URN: urn:nbn:se:bth-16749DOI: 10.1109/DAS.2018.71ISI: 000467070300045ISBN: 978-1-5386-3346-5 (digital)OAI: oai:DiVA.org:bth-16749DiVA, id: diva2:1231250
Konferens
2018 13th IAPR International Workshop on Document Analysis Systems (DAS), vienna
Forskningsfinansiär
KK-stiftelsen, 20140032Tillgänglig från: 2018-07-06 Skapad: 2018-07-06 Senast uppdaterad: 2019-06-28Bibliografiskt granskad
Ingår i avhandling
1. Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
Öppna denna publikation i ny flik eller fönster >>Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning
2018 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable.

In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback.

Ort, förlag, år, upplaga, sidor
Karlskrona: Blekinge Tekniska Högskola, 2018. s. 135
Serie
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 3
Nyckelord
image binarization, heterogeneous computing, recurrent neural networks, interactive machine learning, historical documents
Nationell ämneskategori
Datorteknik Datavetenskap (datalogi) Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:bth-16797 (URN)978-91-7295-355-0 (ISBN)
Presentation
2018-09-10, J1640, Valhallavägen 1, Karlskrona, 10:15 (Engelska)
Opponent
Handledare
Projekt
Scalable resource-efficient systems for big data analytics
Forskningsfinansiär
KK-stiftelsen, 20140032
Tillgänglig från: 2018-08-27 Skapad: 2018-07-12 Senast uppdaterad: 2018-11-14Bibliografiskt granskad

Open Access i DiVA

fulltext(710 kB)36 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 710 kBChecksumma SHA-512
a52f4bdc3bfe0dac170ccfe11838c4b1994425d1cae48097960dc9296642f395df96f481ad55d2dc1cdbd3ca36c1c1999edf40e51365ce43653d8a67c27f9ae0
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltext

Personposter BETA

Westphal, FlorianLavesson, NiklasGrahn, Håkan

Sök vidare i DiVA

Av författaren/redaktören
Westphal, FlorianLavesson, NiklasGrahn, Håkan
Av organisationen
Institutionen för datalogi och datorsystemteknik
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 36 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 335 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf