Open this publication in new window or tab >>Show others...
2021 (English)In: Neural Computing & Applications, ISSN 0941-0643, E-ISSN 1433-3058, Vol. 33, no 22, p. 15863-15875Article in journal (Refereed) Published
Abstract [en]
This paper presents a digital image dataset of historical handwritten birth records stored in the archives of several parishes across Sweden, together with the corresponding metadata that supports the evaluation of document analysis algorithms' performance. The dataset is called SHIBR (the Swedish Historical Birth Records). The contribution of this paper is twofold. First, we believe it is the first and the largest Swedish dataset of its kind provided as open access (15,000 high-resolution colour images of the era between 1800 and 1840). We also perform some data mining of the dataset to uncover some statistics and facts that might be of interest and use to genealogists. Second, we provide a comprehensive survey of contemporary datasets in the field that are open to the public along with a compact review of word spotting techniques. The word transcription file contains 17 columns of information pertaining to each image (e.g., child's first name, birth date, date of baptism, father's first/last name, mother's first/last name, death records, town, job title of the father/mother, etc.). Moreover, we evaluate some deep learning models, pre-trained on two other renowned datasets, for word spotting in SHIBR. However, our dataset proved challenging due to the unique handwriting style. Therefore, the dataset could also be used for competitions dedicated to a large set of document analysis problems, including word spotting.
Place, publisher, year, edition, pages
Springer London, 2021
Keywords
Historical data of birth recordsHandwritten documentsPublic datasetWord spotting
National Category
Public Health, Global Health, Social Medicine and Epidemiology Computer Sciences
Identifiers
urn:nbn:se:bth-22072 (URN)10.1007/s00521-021-06207-z (DOI)000667130400001 ()
Funder
Knowledge Foundation, 20140032The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), AF2020-8892
Note
open access
2021-09-022021-09-022022-05-04Bibliographically approved