Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Clustering of Image Search Results to Support Historical Document Recognition
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2014 (English)Independent thesis Advanced level (degree of Master (One Year))Student thesis
Abstract [en]

Context. Image searching in historical handwritten documents is a challenging problem in computer vision and pattern recognition. The amount of documents which have been digitalized is increasing each day, and the task to find occurrences of a selected sub-image in a collection of documents has special interest for historians and genealogist. Objectives. This thesis develops a technique for image searching in historical documents. Divided in three phases, first the document is segmented into sub-images according to the words on it. These sub-images are defined by a features vector with measurable attributes of its content. And based on these vectors, a clustering algorithm computes the distance between vectors to decide which images match with the selected by the user. Methods. The research methodology is experimentation. A quasi-experiment is designed based on repeated measures over a single group of data. The image processing, features selection, and clustering approach are the independent variables; whereas the accuracies measurements are the dependent variable. This design provides a measurement net based on a set of outcomes related to each other. Results. The statistical analysis is based on the F1 score to measure the accuracy of the experimental results. This test analyses the accuracy of the experiment regarding to its true positives, false positives, and false negatives detected. The average F-measure for the experiment conducted is F1 = 0.59, whereas the actual performance value of the method is matching ratio of 66.4%. Conclusions. This thesis provides a starting point in order to develop a search engine for historical document collections based on pattern recognition. The main research findings are focused in image enhancement and segmentation for degraded documents, and image matching based on features definition and cluster analysis.

Place, publisher, year, edition, pages
2014. , p. 42
Keywords [en]
Historical documents, Computer vision, Features extraction, Clustering
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-5577Local ID: oai:bth.se:arkivex8ECD5F0EFF929D9AC1257D820044097COAI: oai:DiVA.org:bth-5577DiVA, id: diva2:832962
Uppsok
Technology
Supervisors
Available from: 2015-04-22 Created: 2014-10-31 Last updated: 2018-01-11Bibliographically approved

Open Access in DiVA

fulltext(7322 kB)296 downloads
File information
File name FULLTEXT01.pdfFile size 7322 kBChecksum SHA-512
75b5f7114164ef550ba44af40f3f798a3fad151fe5a45ede1ed8cbc32a913da60a3169866432625eaa7131a91b822ec68bcdc676d2fbd5d1cfc09129845ede6f
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 299 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1644 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf