Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0001-7536-3349
2017 (English)In: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016 / [ed] Bi Y., Kapoor S., Bhatia R., Springer, 2017, Vol. 16, p. 23-32Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using k-means clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleed-through and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach.

Place, publisher, year, edition, pages
Springer, 2017. Vol. 16, p. 23-32
Series
Lecture Notes in Networks and Systems, ISSN 2367-3370
Keywords [en]
Handwritten text binarization, Image processing, k-means clustering, Document images
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-15540ISBN: 978-3-319-56990-1 (print)OAI: oai:DiVA.org:bth-15540DiVA, id: diva2:1159621
Conference
SAI Intelligent Systems Conference 2016 (IntelliSys 2016, London
Projects
Scalable resource efficient systems for big data analyticsAvailable from: 2017-11-23 Created: 2017-11-23 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

The full text will be freely available from 2018-08-23 09:20
Available from 2018-08-23 09:20

Authority records BETA

Kusetogullari, Hüseyin

Search in DiVA

By author/editor
Kusetogullari, Hüseyin
By organisation
Department of Computer Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 183 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf