CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Unsupervised Text Binarization in Handwritten Historical Documents Using k-Means Clustering
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.ORCID iD: 0000-0001-7536-3349
2018 (English)In: PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 2 / [ed] Bi, Y Kapoor, S Bhatia, R, SPRINGER INTERNATIONAL PUBLISHING AG , 2018, p. 23-32Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using k-means clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleed-through and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach.

Place, publisher, year, edition, pages
SPRINGER INTERNATIONAL PUBLISHING AG , 2018. p. 23-32
Series
Lecture Notes in Networks and Systems, ISSN 2367-3370 ; 16
Keywords [en]
Handwritten text binarization, Image processing, k-means clustering, Document images
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-17280DOI: 10.1007/978-3-319-56991-8_3ISI: 000448662500003ISBN: 978-3-319-56991-8 (print)OAI: oai:DiVA.org:bth-17280DiVA, id: diva2:1263360
Conference
SAI Annual Conference on Areas of Intelligent Systems and Artificial Intelligence and their Applications to the Real World (IntelliSys), SEP 21-22, 2016, London, ENGLAND
Part of project
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, Knowledge FoundationAvailable from: 2018-11-15 Created: 2018-11-15 Last updated: 2022-05-25Bibliographically approved

Open Access in DiVA

fulltext(2623 kB)269 downloads
File information
File name FULLTEXT01.pdfFile size 2623 kBChecksum SHA-512
2a7e3b9cb9a7fdad1c4eb30b8b537f85242a9b86b21cfa294eeb061b71b5fa5f63b61876515547f3904fb19a1ae4ba0d37d26ed9dfd07b83fb029da44dc76149
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Kusetogullari, Hüseyin

Search in DiVA

By author/editor
Kusetogullari, Hüseyin
By organisation
Department of Computer Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 269 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 191 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf