Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data and Time Efficient Historical Document Analysis
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0002-2161-7371
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the last decades companies and government institutions have gathered vast collections of images of historical handwritten documents. In order to make these collections truly useful to the broader public, images suffering from degradations, such as faded ink, bleed through or stains, need to be made readable and the collections as a whole need to be made searchable. Readability can be achieved by separating text foreground from page background using document image binarization, while searchability by search string or by example image can be achieved through word spotting. Developing algorithms with reasonable binarization or word spotting performance is a difficult task. Additional challenges are to make these algorithms execute fast enough to process vast collections of images in a reasonable amount of time, and to enable them to learn from few labeled training samples. In this thesis, we explore heterogeneous computing, parameter prediction, and enhanced throughput as ways to reduce the execution time of document image binarization algorithms. We find that parameter prediction and mapping a heuristics based binarization algorithm to the GPU lead to an 1.7 and 3.5 increase in execution performance respectively. Furthermore, we identify for a learning based binarization algorithm using recurrent neural networks the number of pixels processed at once as way to trade off execution time with binarization quality. The achieved increase in throughput results in a 3.8 times faster overall execution time. Additionally, we explore guided machine learning (gML) as a possible approach to reduce the required amount of training data for learning based algorithms for binarization, character recognition and word spotting. We propose an initial gML system for binarization, which allows a user to improve an algorithm’s binarization quality by selecting suitable training samples. Based on this system, we identify and pursue three different directions, viz., formulation of a clear definition of gML, identification of an efficient knowledge transfer mechanism from user to learner, and automation of sample selection. We explore the Learning Using Privileged Information paradigm as a possible knowledge transfer mechanism by using character graphs as privileged information for training a neural network based character recognizer. Furthermore, we show that, given a suitable word image representation, automatic sample selection can help to reduce the amount of training data required for word spotting by up to 69%.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2020. , p. 202
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 5
National Category
Computer Engineering Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Research subject
Computer Systems Engineering
Identifiers
URN: urn:nbn:se:bth-19529ISBN: 978-91-7295-404-5 (print)OAI: oai:DiVA.org:bth-19529DiVA, id: diva2:1433334
Public defence
2020-09-03, J1630, Valhallavägen 1, Karlskrona, 13:15 (English)
Opponent
Supervisors
Funder
Knowledge Foundation, 20140032Available from: 2020-05-29 Created: 2020-05-29 Last updated: 2020-12-14Bibliographically approved
List of papers
1. Efficient document image binarization using heterogeneous computing and parameter tuning
Open this publication in new window or tab >>Efficient document image binarization using heterogeneous computing and parameter tuning
2018 (English)In: International Journal on Document Analysis and Recognition, ISSN 1433-2833, E-ISSN 1433-2825, Vol. 21, no 1-2, p. 41-58Article in journal (Refereed) Published
Abstract [en]

In the context of historical document analysis, image binarization is a first important step, which separates foreground from background, despite common image degradations, such as faded ink, stains, or bleed-through. Fast binarization has great significance when analyzing vast archives of document images, since even small inefficiencies can quickly accumulate to years of wasted execution time. Therefore, efficient binarization is especially relevant to companies and government institutions, who want to analyze their large collections of document images. The main challenge with this is to speed up the execution performance without affecting the binarization performance. We modify a state-of-the-art binarization algorithm and achieve on average a 3.5 times faster execution performance by correctly mapping this algorithm to a heterogeneous platform, consisting of a CPU and a GPU. Our proposed parameter tuning algorithm additionally improves the execution time for parameter tuning by a factor of 1.7, compared to previous parameter tuning algorithms. We see that for the chosen algorithm, machine learning-based parameter tuning improves the execution performance more than heterogeneous computing, when comparing absolute execution times. © 2018 The Author(s)

Place, publisher, year, edition, pages
Springer Verlag, 2018
Keywords
Automatic parameter tuning, Heterogeneous computing, Historical documents, Image binarization, Bins, History, Image analysis, Learning systems, Document image binarization, Government institutions, Heterogeneous platforms, Parameter tuning algorithm, Parameter estimation
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-15891 (URN)10.1007/s10032-017-0293-7 (DOI)000433193500003 ()2-s2.0-85041228615 (Scopus ID)
Available from: 2018-02-15 Created: 2018-02-15 Last updated: 2021-07-26Bibliographically approved
2. User Feedback and Uncertainty in User Guided Binarization
Open this publication in new window or tab >>User Feedback and Uncertainty in User Guided Binarization
2018 (English)In: International Conference on Data Mining Workshops / [ed] Tong, H; Li, Z; Zhu, F; Yu, J, IEEE Computer Society, 2018, p. 403-410, article id 8637367Conference paper, Published paper (Refereed)
Abstract [en]

In a child’s development, the child’s inherent ability to construct knowledge from new information is as important as explicit instructional guidance. Similarly, mechanisms to produce suitable learning representations, which can be trans- ferred and allow integration of new information are important for artificial learning systems. However, equally important are modes of instructional guidance, which allow the system to learn efficiently. Thus, the challenge for efficient learning is to identify suitable guidance strategies together with suitable learning mechanisms.

In this paper, we propose guided machine learning as source for suitable guidance strategies, we distinguish be- tween sample selection based and privileged information based strategies and evaluate three sample selection based strategies on a simple transfer learning task. The evaluated strategies are random sample selection, i.e., supervised learning, user based sample selection based on readability, and user based sample selection based on readability and uncertainty. We show that sampling based on readability and uncertainty tends to produce better learning results than the other two strategies. Furthermore, we evaluate the use of the learner’s uncertainty for self directed learning and find that effects similar to the Dunning-Kruger effect prevent this use case. The learning task in this study is document image binarization, i.e., the separation of text foreground from page background and the source domain of the transfer are texts written on paper in Latin characters, while the target domain are texts written on palm leaves in Balinese script.

Place, publisher, year, edition, pages
IEEE Computer Society, 2018
Keywords
guided machine learning, interactive machine learning, image binarization, historical documents
National Category
Computer Vision and Robotics (Autonomous Systems) Human Computer Interaction
Identifiers
urn:nbn:se:bth-17742 (URN)10.1109/ICDMW.2018.00066 (DOI)000465766800058 ()978-1-5386-9288-2 (ISBN)
Conference
18th IEEE International Conference on Data Mining Workshops, ICDMW, Singapore; Singapore; 17 November 2018 through 20 November
Funder
Knowledge Foundation, 20140032
Note

 "© 20XX IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Available from: 2019-03-27 Created: 2019-03-27 Last updated: 2021-07-26Bibliographically approved
3. Representative Image Selection for Data Efficient Word Spotting
Open this publication in new window or tab >>Representative Image Selection for Data Efficient Word Spotting
2020 (English)In: Lecture Notes in Computer Science / [ed] Bai X.,Karatzas D.,Lopresti D., Springer, 2020, Vol. 12116, p. 383-397Conference paper, Published paper (Refereed)
Abstract [en]

This paper compares three different word image representations as base for label free sample selection for word spotting in historical handwritten documents. These representations are a temporal pyramid representation based on pixel counts, a graph based representation, and a pyramidal histogram of characters (PHOC) representation predicted by a PHOCNet trained on synthetic data. We show that the PHOC representation can help to reduce the amount of required training samples by up to 69% depending on the dataset, if it is learned iteratively in an active learning like fashion. While this works for larger datasets containing about 1 700 images, for smaller datasets with 100 images, we find that the temporal pyramid and the graph representation perform better.

Place, publisher, year, edition, pages
Springer, 2020
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Keywords
word spotting, sample selection, graph representation, PHOCNet, active learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:bth-19528 (URN)10.1007/978-3-030-57058-3_27 (DOI)000885905800027 ()9783030570576 (ISBN)
Conference
14th IAPR International Workshop on Document Analysis Systems, DAS 2020, Wuhan, China, 26 July 2020 through 29 July 2020
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-05-29 Created: 2020-05-29 Last updated: 2023-03-24Bibliographically approved
4. Document Image Binarization Using Recurrent Neural Networks
Open this publication in new window or tab >>Document Image Binarization Using Recurrent Neural Networks
2018 (English)In: Proceedings - 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, IEEE, 2018, p. 263-268Conference paper, Published paper (Refereed)
Abstract [en]

In the context of document image analysis, image binarization is an important preprocessing step for other document analysis algorithms, but also relevant on its own by improving the readability of images of historical documents. While historical document image binarization is challenging due to common image degradations, such as bleedthrough, faded ink or stains, achieving good binarization performance in a timely manner is a worthwhile goal to facilitate efficient information extraction from historical documents. In this paper, we propose a recurrent neural network based algorithm using Grid Long Short-Term Memory cells for image binarization, as well as a pseudo F-Measure based weighted loss function. We evaluate the binarization and execution performance of our algorithm for different choices of footprint size, scale factor and loss function. Our experiments show a significant trade-off between binarization time and quality for different footprint sizes. However, we see no statistically significant difference when using different scale factors and only limited differences for different loss functions. Lastly, we compare the binarization performance of our approach with the best performing algorithm in the 2016 handwritten document image binarization contest and show that both algorithms perform equally well.

Place, publisher, year, edition, pages
IEEE, 2018
Keywords
image binarization, recurrent neural networks, Grid LSTM, historical documents, Text analysis, Labeling, Recurrent neural networks, Heuristic algorithms, Training, Degradation, Ink
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:bth-16749 (URN)10.1109/DAS.2018.71 (DOI)000467070300045 ()978-1-5386-3346-5 (ISBN)
Conference
2018 13th IAPR International Workshop on Document Analysis Systems (DAS), vienna
Funder
Knowledge Foundation, 20140032
Available from: 2018-07-06 Created: 2018-07-06 Last updated: 2021-07-26Bibliographically approved
5. A Case for Guided Machine Learning
Open this publication in new window or tab >>A Case for Guided Machine Learning
2019 (English)In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) / [ed] Andreas Hozinger, Peter Kieseberg, A Min Tjoa and Edgar Weippl, Springer, 2019, Vol. 11713, p. 353-361Conference paper, Published paper (Refereed)
Abstract [en]

Involving humans in the learning process of a machine learning algorithm can have many advantages ranging from establishing trust into a particular model to added personalization capabilities to reducing labeling efforts. While these approaches are commonly summarized under the term interactive machine learning (iML), no unambiguous definition of iML exists to clearly define this area of research. In this position paper, we discuss the shortcomings of current definitions of iML and propose and define the term guided machine learning (gML) as an alternative.

Place, publisher, year, edition, pages
Springer, 2019
Series
Lecture Notes in Computer Science, ISSN 03029743, E-ISSN 16113349
Keywords
guided machine learning, interactive machine learning, human-in-the-loop, definition
National Category
Human Computer Interaction Computer Sciences
Identifiers
urn:nbn:se:bth-18708 (URN)10.1007/978-3-030-29726-8_22 (DOI)000558148400022 ()978-3-030-29726-8 (ISBN)
Conference
3rd IFIP Cross Domain Conference for Machine Learning and Knowledge Extraction, CD-MAKE 2019; Canterbury; United Kingdom; 26 August 2019 through 29 August
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2019-09-27 Created: 2019-09-27 Last updated: 2022-05-06Bibliographically approved
6. Learning character recognition with graph-based privileged information
Open this publication in new window or tab >>Learning character recognition with graph-based privileged information
2019 (English)In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, IEEE Computer Society , 2019, p. 1163-1168, article id 8978028Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes a pre-training method for neural network-based character recognizers to reduce the required amount of training data, and thus the human labeling effort. The proposed method transfers knowledge about the similarities between graph representations of characters to the recognizer by training to predict the graph edit distance. We show that convolutional neural networks trained with this method outperform traditional supervised learning if only ten or less labeled images per class are available. Furthermore, we show that our approach performs up to 33% better than a graph edit distance based recognition approach, even if only one labeled image per class is available. © 2019 IEEE.

Place, publisher, year, edition, pages
IEEE Computer Society, 2019
Keywords
Character recognition, Convolutional neural networks, Graph matching, Learning using privileged information, Convolution, Graphic methods, Graph edit distance, Graph matchings, Graph representation, Labeled images, Method transfers, Pre-training, Training data
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-19275 (URN)10.1109/ICDAR.2019.00188 (DOI)2-s2.0-85079896010 (Scopus ID)9781728128610 (ISBN)
Conference
15th IAPR International Conference on Document Analysis and Recognition, ICDAR, Sydney, 20 September 2019 through 25 September 2019
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-03-05 Created: 2020-03-05 Last updated: 2021-10-07Bibliographically approved

Open Access in DiVA

fulltext(13001 kB)1129 downloads
File information
File name FULLTEXT02.pdfFile size 13001 kBChecksum SHA-512
989501462911627f788ca208f407f40496437941a3cd0a10f488763b6d57c3b0db1cac621d1418977b207335b6ce5327451c48a7087c12668f0714aa8b2a00ce
Type fulltextMimetype application/pdf

Authority records

Westphal, Florian

Search in DiVA

By author/editor
Westphal, Florian
By organisation
Department of Computer Science
Computer EngineeringComputer SciencesComputer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 1132 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3525 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf