Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Subimage matching in historical documents using SIFT keypoints and clustering
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context: In this thesis subimage matching in historical handwritten documents using SIFT (Scale-Invariant Feature Transform) keypoints was tested. SIFT features are invariant to scale and rotation and have gained a lot of interest in the research community. The historical documents used in this thesis orignates from 16th century and forward. The following steps have been executed; binarization, word segmentation, feature identification and clustering. The binarization step converts the images into binary images. The word segmentation separates the different words into individual subimages. In the feature identification SIFT keypoints was found and descriptors was computed. The last step was to cluster the images based on the distances between the set of image features identified. Objectives: The main objectives are to find a good configuration for the binarization step, implement a good word segmentation, identify image features and lastly to cluster the images based on their similarity. The context from subimages are matched to each other rather than trying to predict what the context of a subimage is, simply because the data that has been used is unlabeled. Methods: Implementation were the main methodology used combined with experimentation. Measurements were taken throughout the development and accuracy of word segmentation and the clustering is measured. Results: The word segmentation got an average accuracy of 89\% correct segmentation which is comparable to other word segmentating results. The clustering however matched 0% correctly.Conclusions: The conclusions that have been drawn from this study is that SIFT keypoints are not very well suited for this type of problem which includes a lot of handwritten text. The descriptors were not discriminative enough and different keypoints were found in different images with the same handwritten text, which lead to the bad clustering results.

Place, publisher, year, edition, pages
2015. , p. 36
Keywords [en]
Handwritten, Image matching, SIFT, Segmentation
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-10417OAI: oai:DiVA.org:bth-10417DiVA, id: diva2:839793
Subject / course
DV2566 Master's Thesis (120 credits) in Computer Science
Educational program
DVACS Master of Science Programme in Computer Science
Supervisors
Examiners
Available from: 2015-08-05 Created: 2015-07-05 Last updated: 2018-01-11Bibliographically approved

Open Access in DiVA

fulltext(7035 kB)1484 downloads
File information
File name FULLTEXT02.pdfFile size 7035 kBChecksum SHA-512
da73d607a54cc987621a329b744dccc0b94cd158524da4156ebd7d2985727c1531d95cf98c60e83c9f208e2df10c1f58e4aba4cae9f54e5a306574fba9f44129
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Åberg, Hampus
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1484 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 491 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf