Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SHIBR-The Swedish Historical Birth Records: a semi-annotated dataset
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0002-4390-411x
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0001-7536-3349
Peltarion AB,SWE.
Independent Researcher, SWE.
Show others and affiliations
2021 (English)In: Neural Computing & Applications, ISSN 0941-0643, E-ISSN 1433-3058, Vol. 33, no 22, p. 15863-15875Article in journal (Refereed) Published
Abstract [en]

This paper presents a digital image dataset of historical handwritten birth records stored in the archives of several parishes across Sweden, together with the corresponding metadata that supports the evaluation of document analysis algorithms' performance. The dataset is called SHIBR (the Swedish Historical Birth Records). The contribution of this paper is twofold. First, we believe it is the first and the largest Swedish dataset of its kind provided as open access (15,000 high-resolution colour images of the era between 1800 and 1840). We also perform some data mining of the dataset to uncover some statistics and facts that might be of interest and use to genealogists. Second, we provide a comprehensive survey of contemporary datasets in the field that are open to the public along with a compact review of word spotting techniques. The word transcription file contains 17 columns of information pertaining to each image (e.g., child's first name, birth date, date of baptism, father's first/last name, mother's first/last name, death records, town, job title of the father/mother, etc.). Moreover, we evaluate some deep learning models, pre-trained on two other renowned datasets, for word spotting in SHIBR. However, our dataset proved challenging due to the unique handwriting style. Therefore, the dataset could also be used for competitions dedicated to a large set of document analysis problems, including word spotting.

Place, publisher, year, edition, pages
Springer London, 2021. Vol. 33, no 22, p. 15863-15875
Keywords [en]
Historical data of birth recordsHandwritten documentsPublic datasetWord spotting
National Category
Public Health, Global Health, Social Medicine and Epidemiology Computer Sciences
Identifiers
URN: urn:nbn:se:bth-22072DOI: 10.1007/s00521-021-06207-zISI: 000667130400001OAI: oai:DiVA.org:bth-22072DiVA, id: diva2:1590269
Part of project
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, Knowledge Foundation
Funder
Knowledge Foundation, 20140032The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), AF2020-8892
Note

open access

Available from: 2021-09-02 Created: 2021-09-02 Last updated: 2022-05-04Bibliographically approved

Open Access in DiVA

fulltext(1213 kB)142 downloads
File information
File name FULLTEXT01.pdfFile size 1213 kBChecksum SHA-512
a588c36ecf84b289e2ba6d718fc79170ef5f70e1e87aa8638754f128f7eb3f484f00c6ef1f33ad3b3c80643785c23bdc6fa8c94b1db460821b3458230f9a1d03
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Cheddad, AbbasKusetogullari, Hüseyin

Search in DiVA

By author/editor
Cheddad, AbbasKusetogullari, Hüseyin
By organisation
Department of Computer Science
In the same journal
Neural Computing & Applications
Public Health, Global Health, Social Medicine and EpidemiologyComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 142 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 133 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf