Planned maintenance
A system upgrade is planned for 24/9-2024, at 12:00-14:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Representative Image Selection for Data Efficient Word Spotting
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0002-2161-7371
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0001-9947-1088
Jönköpings universitet.ORCID iD: 0000-0002-0535-1761
2020 (English)In: Lecture Notes in Computer Science / [ed] Bai X.,Karatzas D.,Lopresti D., Springer, 2020, Vol. 12116, p. 383-397Conference paper, Published paper (Refereed)
Abstract [en]

This paper compares three different word image representations as base for label free sample selection for word spotting in historical handwritten documents. These representations are a temporal pyramid representation based on pixel counts, a graph based representation, and a pyramidal histogram of characters (PHOC) representation predicted by a PHOCNet trained on synthetic data. We show that the PHOC representation can help to reduce the amount of required training samples by up to 69% depending on the dataset, if it is learned iteratively in an active learning like fashion. While this works for larger datasets containing about 1 700 images, for smaller datasets with 100 images, we find that the temporal pyramid and the graph representation perform better.

Place, publisher, year, edition, pages
Springer, 2020. Vol. 12116, p. 383-397
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Keywords [en]
word spotting, sample selection, graph representation, PHOCNet, active learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:bth-19528DOI: 10.1007/978-3-030-57058-3_27ISI: 000885905800027ISBN: 9783030570576 (print)OAI: oai:DiVA.org:bth-19528DiVA, id: diva2:1433314
Conference
14th IAPR International Workshop on Document Analysis Systems, DAS 2020, Wuhan, China, 26 July 2020 through 29 July 2020
Part of project
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, Knowledge Foundation
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-05-29 Created: 2020-05-29 Last updated: 2023-03-24Bibliographically approved
In thesis
1. Data and Time Efficient Historical Document Analysis
Open this publication in new window or tab >>Data and Time Efficient Historical Document Analysis
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the last decades companies and government institutions have gathered vast collections of images of historical handwritten documents. In order to make these collections truly useful to the broader public, images suffering from degradations, such as faded ink, bleed through or stains, need to be made readable and the collections as a whole need to be made searchable. Readability can be achieved by separating text foreground from page background using document image binarization, while searchability by search string or by example image can be achieved through word spotting. Developing algorithms with reasonable binarization or word spotting performance is a difficult task. Additional challenges are to make these algorithms execute fast enough to process vast collections of images in a reasonable amount of time, and to enable them to learn from few labeled training samples. In this thesis, we explore heterogeneous computing, parameter prediction, and enhanced throughput as ways to reduce the execution time of document image binarization algorithms. We find that parameter prediction and mapping a heuristics based binarization algorithm to the GPU lead to an 1.7 and 3.5 increase in execution performance respectively. Furthermore, we identify for a learning based binarization algorithm using recurrent neural networks the number of pixels processed at once as way to trade off execution time with binarization quality. The achieved increase in throughput results in a 3.8 times faster overall execution time. Additionally, we explore guided machine learning (gML) as a possible approach to reduce the required amount of training data for learning based algorithms for binarization, character recognition and word spotting. We propose an initial gML system for binarization, which allows a user to improve an algorithm’s binarization quality by selecting suitable training samples. Based on this system, we identify and pursue three different directions, viz., formulation of a clear definition of gML, identification of an efficient knowledge transfer mechanism from user to learner, and automation of sample selection. We explore the Learning Using Privileged Information paradigm as a possible knowledge transfer mechanism by using character graphs as privileged information for training a neural network based character recognizer. Furthermore, we show that, given a suitable word image representation, automatic sample selection can help to reduce the amount of training data required for word spotting by up to 69%.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2020. p. 202
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 5
National Category
Computer Engineering Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Research subject
Computer Systems Engineering
Identifiers
urn:nbn:se:bth-19529 (URN)978-91-7295-404-5 (ISBN)
Public defence
2020-09-03, J1630, Valhallavägen 1, Karlskrona, 13:15 (English)
Opponent
Supervisors
Funder
Knowledge Foundation, 20140032
Available from: 2020-05-29 Created: 2020-05-29 Last updated: 2020-12-14Bibliographically approved

Open Access in DiVA

fulltext(454 kB)508 downloads
File information
File name FULLTEXT01.pdfFile size 454 kBChecksum SHA-512
e31f0fb5fceac2840a9a966a3d8e8b2766eaabca36519e8a51f2bf8e34aeee3f90fd69cfb73ee82f25c7f9094fd53c7955e2a00c00c1d16d88b695946f609499
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Westphal, FlorianGrahn, HåkanLavesson, Niklas

Search in DiVA

By author/editor
Westphal, FlorianGrahn, HåkanLavesson, Niklas
By organisation
Department of Computer Science
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 508 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 529 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf