[{"_id":"project:9548","_type":"project","abstract":{"sv":"Att bevara och dela tillgång till vårt dokumentära arv, som representerar ett levande och kollektivt minne av våra samhällen, är mycket viktigt. Digitala arkiv av handskrivna formulär har för närvarande inga genomförbara eller praktiska sätt att söka efter åtkomst.SyfteProjektet syftar till att undersöka och implementera ett lämpligt IKT-baserat system för att ge verklig tillgång till de digitaliserade arkivdokumenten och ger en växande gemenskap av användare automatiserade verktyg för igenkänning, transkription och indexering av handskrivna arkivdokument.GenomförandeHuvudidén med projektet är att tillhandahålla AI-baserade lösningar för att göra digitala historiska handskrivna dokument mer tillgängliga.Utmaningen ligger i det faktum att digitaliserade dokument är av mycket komplex struktur och uppvisar varierande skrivstilar på grund av olika författare eller åldrar bland andra frågor.Projektet kommer att driva innovation inom historisk handskriven dokumentanalys och erkännande och kommer att utveckla innovativa verktyg för att förbättra kapaciteten för historisk dokumentbehandling och hämtning.","en":"Preserving and sharing access to our documentary heritage, which represents a living and collective memory of our communities, is very important. Digital archives of handwritten forms currently have no feasible or practical means of searching for access.Aim of the projectThe project aims to investigate and implement a suitable ICT-based system to provide real access to the digitized archival documents and provide a growing community of users with automated tools for the recognition, transcription and indexing of handwritten archival documents.Implementation of the projectThe main idea of the project is to provide AI-based solutions to make digital historical handwritten documents more accessible.The challenge lies in the fact that digitized documents are of very complex structure and exhibit varying writing styles due to different authors or ages among other issues.The project will drive innovation in historical handwritten document analysis and recognition and will develop innovative tools to improve the capabilities of historical document processing and retrieval."},"project_id":"AF2020-8892","identifier_short":"AF2020-8892","local_ids":{"bth":["BTH-6.1.1-0122-2020"]},"dates":{"start_date":"2021-05-01","end_date":"2022-04-29"},"organizations":[{"funding":[{"_id":128,"id":"802400-3512","sv":"Stiftelsen för internationalisering av högre utbildning och forskning (STINT)","en":"The Swedish Foundation for International Cooperation in Research and Higher Education (STINT)"}]},{"coordinating":[{"_id":16500,"id":"202100-4011","sv":"Blekinge Tekniska Högskola","en":"Blekinge Institute of Technology"}]}],"people":[{"project_leaders":[{"_id":"authority-person:44612","orcid":"0000-0002-4390-411X","name":"Cheddad, Abbas","role":"principal_investigator","affiliation":[{"_id":881506,"sv":"Institutionen för datavetenskap","en":"Department of Computer Science","parent":[{"_id":16801,"sv":"Fakulteten för datavetenskaper","en":"Faculty of Computing","parent":[{"_id":16500,"id":"202100-4011","sv":"Blekinge Tekniska Högskola","en":"Blekinge Institute of Technology"}]}]}]}]},{"other_personnel":[]}],"tags":[{"_id":11510,"id":"10201","sv":"Datavetenskap (datalogi)","en":"Computer Sciences"}],"titles":{"en":"DocPRESERV – Preserving & Processing Historical Document Images with Artificial Intelligence"},"type_of_awards":{"sv":"Internationellt samarbete","en":"International cooperation"},"publications":[{"id":"diva2:1988257","type":"article-journal","status":"Published","issued":{"date-parts":[[2026]]},"title":"ST-KeyS : Self-supervised Transformer for Keyword Spotting in historical handwritten documents","language":"eng","author":[{"family":"Khamekhem Jemni","given":"Sana","affiliation":[{"name":"Digital Research Center of Sfax,Tunisia"}]},{"family":"Ammar","given":"Sourour","affiliation":[{"name":"Digital Research Center of Sfax,Tunisia"}]},{"family":"Souibgui","given":"Mohamed Ali","affiliation":[{"name":"Universitat Autònoma de Barcelona, Spain"}]},{"family":"Kessentini","given":"Yousri","affiliation":[{"name":"Digital Research Center of Sfax,Tunisia"}]},{"family":"Cheddad","given":"Abbas","ORCID":"0000-0002-4390-411X","localId":"abc","affiliation":[{"id":"881506","name":"Blekinge Tekniska Högskola, Institutionen för datavetenskap"}]}],"abstract":"Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods rely on machine learning techniques, which typically require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpora for training. To handle the data scarcity issue, we investigate the merits of self-supervised learning to extract useful representations of the input data without relying on human annotations and then use these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm without the need for labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a fine-tuned Siamese neural network model to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. The proposed approach outperforms state-of-the-art methods trained on the same datasets in an exhaustive experimental evaluation of five widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle, George Washington, Esposalles, and RIMES). ","DOI":"10.1016/j.patcog.2025.112036","ScopusId":"2-s2.0-105009722690","NBN":"urn:nbn:se:bth-28471","volume":"170","number":"112036","container-title":"Pattern Recognition","ISSN":"1873-5142","keyword":"Keyword spotting; Masked autoencoders; PHOC embedding; Self-supervised learning; Siamese neural networks; Visual transformers; Character recognition; History; Image representation; Labeled data; Learning algorithms; Learning systems; Neural networks; Supervised learning; Auto encoders; Embeddings; Handwritten document; Historical documents; Masked autoencoder; Neural-networks; Pyramidal histogram of character embedding; Siamese neural network; Visual transformer; Signal encoding","publisher":"Elsevier","published":[{"raw":"2025-08-11T13:03:00.000+02:00"}],"created":[{"raw":"2025-08-11T13:03:18.172+02:00"}],"updated":[{"raw":"2025-09-30T12:02:42.901+02:00"}],"URL":"https://urn.kb.se/resolve?urn=urn:nbn:se:bth-28471"},{"id":"diva2:1599504","type":"paper-conference","issued":{"date-parts":[[2021]]},"title":"End-to-End Approach for Recognition of Historical Digit Strings","language":"eng","author":[{"family":"Zhao","given":"Mengqiao","affiliation":[{"name":"student"},{"id":"881506","name":"Blekinge Tekniska Högskola, Institutionen för datavetenskap"}]},{"family":"Hochuli","given":"Andre Gustavo","affiliation":[{"name":"Pontifical Catholic University of Parana (PPGIa/PUCPR), BRA"}]},{"family":"Cheddad","given":"Abbas","ORCID":"0000-0002-4390-411x","localId":"abc","affiliation":[{"id":"881506","name":"Blekinge Tekniska Högskola, Institutionen för datavetenskap"}]}],"abstract":"The plethora of digitalised historical document datasets released in recent years has rekindled interest in advancing the field of handwriting pattern recognition. In the same vein, a recently published data set, known as ARDIS, presents handwritten digits manually cropped from 15.000 scanned documents of Swedish churches’ books that exhibit various handwriting styles. To this end, we propose an end-to-end segmentation- free deep learning approach to handle this challenging ancient handwriting style of dates present in the ARDIS dataset (4-digits long strings). We show that with slight modifications in the VGG-16 deep model, the framework can achieve a recognition rate of 93.2%, resulting in a feasible solution free of heuristic methods, segmentation, and fusion methods. Moreover, the proposed approach outperforms the well-known CRNN method (a model widely applied in handwriting recognition tasks). © 2021, Springer Nature Switzerland AG.","ISBN":"9783030863333","DOI":"10.1007/978-3-030-86334-0_39","ScopusId":"2-s2.0-85115317825","NBN":"urn:nbn:se:bth-22169","page":"595-609","container-title":"Lecture Notes in Computer Science","event":"16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Online, 5 September 2021 - 10 September 2021","keyword":"Handwriting digit string recognition; Historical document processing; Segmentation-free; Character recognition; Copying; Deep learning; History; Data set; Document datasets; End to end; Handwriting Styles; Handwritten digit; Historical documents; Swedishs; Heuristic methods","publisher":"Springer Science and Business Media Deutschland GmbH","note":"[ed] Lladós J., Lopresti D., Uchida S.","published":[{"raw":"2021-10-01T09:22:00.000+02:00"}],"created":[{"raw":"2021-10-01T09:22:47.931+02:00"}],"updated":[{"raw":"2025-09-30T13:02:15.297+02:00"}],"URL":"https://urn.kb.se/resolve?urn=urn:nbn:se:bth-22169"},{"id":"diva2:1590269","type":"article-journal","status":"Published","issued":{"date-parts":[[2021]]},"title":"SHIBR-The Swedish Historical Birth Records : a semi-annotated dataset","language":"eng","author":[{"family":"Cheddad","given":"Abbas","ORCID":"0000-0002-4390-411x","localId":"abc","affiliation":[{"id":"881506","name":"Blekinge Tekniska Högskola, Institutionen för datavetenskap"}]},{"family":"Kusetogullari","given":"Hüseyin","ORCID":"0000-0001-7536-3349","localId":"hku","affiliation":[{"id":"881506","name":"Blekinge Tekniska Högskola, Institutionen för datavetenskap"}]},{"family":"Hilmkil","given":"Agrin","affiliation":[{"name":"Peltarion AB,SWE"}]},{"family":"Sundin","given":"Lena","affiliation":[{"name":"Independent Researcher, SWE"}]},{"family":"Yavariabdi","given":"Amir","affiliation":[{"name":"KTO Karatay Univ, TUR"}]},{"family":"Aouache","given":"Mustapha","affiliation":[{"name":"Dev Technol Avancees CDTA, Div Telecom, DZA"}]},{"family":"Hall","given":"Johan","affiliation":[{"name":"Arkiv Digital AD AB, SWE"}]}],"abstract":"This paper presents a digital image dataset of historical handwritten birth records stored in the archives of several parishes across Sweden, together with the corresponding metadata that supports the evaluation of document analysis algorithms' performance. The dataset is called SHIBR (the Swedish Historical Birth Records). The contribution of this paper is twofold. First, we believe it is the first and the largest Swedish dataset of its kind provided as open access (15,000 high-resolution colour images of the era between 1800 and 1840). We also perform some data mining of the dataset to uncover some statistics and facts that might be of interest and use to genealogists. Second, we provide a comprehensive survey of contemporary datasets in the field that are open to the public along with a compact review of word spotting techniques. The word transcription file contains 17 columns of information pertaining to each image (e.g., child's first name, birth date, date of baptism, father's first/last name, mother's first/last name, death records, town, job title of the father/mother, etc.). Moreover, we evaluate some deep learning models, pre-trained on two other renowned datasets, for word spotting in SHIBR. However, our dataset proved challenging due to the unique handwriting style. Therefore, the dataset could also be used for competitions dedicated to a large set of document analysis problems, including word spotting.","DOI":"10.1007/s00521-021-06207-z","NBN":"urn:nbn:se:bth-22072","issue":"22","volume":"33","page":"15863-15875","container-title":"Neural Computing & Applications","ISSN":"1433-3058","keyword":"Historical data of birth recordsHandwritten documentsPublic datasetWord spotting","publisher":"Springer London","note":"open access","published":[{"raw":"2021-09-02T10:11:00.000+02:00"}],"created":[{"raw":"2021-09-02T10:11:13.042+02:00"}],"updated":[{"raw":"2025-09-30T13:03:05.483+02:00"}],"URL":"https://urn.kb.se/resolve?urn=urn:nbn:se:bth-22072"}],"links":[{"type":"pid","link":"https://bth.diva-portal.org/smash/api/project/swecris/project:9548"}]}]