Change search
ExportLink to record
Permanent link

Direct link
BETA

Project

Project type/Form of grant
Grant to research environment
Title [en]
Bigdata@BTH- Scalable resource-efficient systems for big data analytics
Abstract [en]
Data will be generated at an ever-increasing rate for the foreseeable future. Added value and cost savings can be obtained by analyzing big data streams. The analysis of large data sets requires scalable and high-performance computer systems. In order to stay competitive and to reduce consumption of energy and other resources, the next generation systems for scalable big data analytics need to be more resource-efficient. The research profile, Scalable resource-efficient systems for big data analytics, combines existing expertise in machine learning, data mining, and computer engineering to create new knowledge in the area of scalable resource-efficient systems for big data analytics. The value of the new knowledge will be demonstrated and evaluated in two application areas (decision support systems and image processing).The needs and interests of our 9 industrial partners are grouped into industrial challenges. Based on these challenges and in cooperation with our partners we have defined initial sub-projects grouped into four research themes:Research theme A: Big data analytics for decision supportResearch theme B: Big data analytics for image processingResearch theme C: Core technologies (machine learning)Research theme D: Foundations and enabling technologiesThis research profile is in the center of the university’s vision to be a globally attractive knowledge community within applied information technology and innovation for sustainable growth.
Publications (10 of 118) Show all publications
Yavariabdi, A., Kusetogullari, H., Celik, T., Thummanapally, S., Rijwan, S. & Hall, J. (2022). CArDIS: A Swedish Historical Handwritten Character and Word Dataset. IEEE Access, 10, 55338-55349
Open this publication in new window or tab >>CArDIS: A Swedish Historical Handwritten Character and Word Dataset
Show others...
2022 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 10, p. 55338-55349Article in journal (Refereed) Published
Abstract [en]

This paper introduces a new publicly available image-based Swedish historical handwritten character and word dataset named Character Arkiv Digital Sweden (CArDIS) (https://cardisdataset.github.io/CARDIS/). The samples in CArDIS are collected from 64, 084 Swedish historical documents written by several anonymous priests between 1800 and 1900. The dataset contains 116, 000 Swedish alphabet images in RGB color space with 29 classes, whereas the word dataset contains 30, 000 image samples of ten popular Swedish names as well as 1, 000 region names in Sweden. To examine the performance of different machine learning classifiers on CArDIS dataset, three different experiments are conducted. In the first experiment, classifiers such as Support Vector Machine (SVM), Artificial Neural Networks (ANN), k-Nearest Neighbor (k-NN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Random Forest (RF) are trained on existing character datasets which are Extended Modified National Institute of Standards and Technology (EMNIST), IAM and CVL and tested on CArDIS dataset. In the second and third experiments, the same classifiers as well as two pre-trained VGG-16 and VGG-19 classifiers are trained and tested on CArDIS character and word datasets. The experiments show that the machine learning methods trained on existing handwritten character datasets struggle to recognize characters efficiently on the CArDIS dataset, proving that characters in the CArDIS contain unique features and characteristics. Moreover, in the last two experiments, the deep learning-based classifiers provide the best recognition rates.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2022
Keywords
Character recognition, Optical character recognition software, Feature extraction, Hidden Markov models, Handwriting recognition, Machine learning, Image recognition, Character and word recognition, machine learning methods, optical character recognition (OCR), old handwritten style, Swedish handwritten character dataset, Swedish handwritten word dataset
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-23171 (URN)10.1109/ACCESS.2022.3175197 (DOI)000804633200001 ()
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2022-06-16 Created: 2022-06-16 Last updated: 2022-06-16Bibliographically approved
Nordahl, C., Boeva, V., Grahn, H. & Netz Persson, M. (2022). EvolveCluster: an evolutionary clustering algorithm for streaming data. Evolving Systems (4), 603-623
Open this publication in new window or tab >>EvolveCluster: an evolutionary clustering algorithm for streaming data
2022 (English)In: Evolving Systems, ISSN 1868-6478, E-ISSN 1868-6486, no 4, p. 603-623Article in journal (Refereed) Published
Abstract [en]

Data has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Place, publisher, year, edition, pages
SPRINGER HEIDELBERG, 2022
Keywords
Evolving data stream; Clustering; Data stream clustering
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-22395 (URN)10.1007/s12530-021-09408-y (DOI)000717906700001 ()2-s2.0-85119001929 (Scopus ID)
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2021-11-26 Created: 2021-11-26 Last updated: 2023-11-03Bibliographically approved
Devagiri, V. M., Boeva, V. & Abghari, S. (2021). A Multi-view Clustering Approach for Analysis of Streaming Data. In: Maglogiannis I., Macintyre J., Iliadis L. (Ed.), IFIP Advances in Information and Communication Technology: . Paper presented at 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2021, Virtual, Online, 25 June 2021 - 27 June 2021 (pp. 169-183). Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>A Multi-view Clustering Approach for Analysis of Streaming Data
2021 (English)In: IFIP Advances in Information and Communication Technology / [ed] Maglogiannis I., Macintyre J., Iliadis L., Springer Science and Business Media Deutschland GmbH , 2021, p. 169-183Conference paper, Published paper (Refereed)
Abstract [en]

Data available today in smart monitoring applications such as smart buildings, machine health monitoring, smart healthcare, etc., is not centralized and usually supplied by a number of different devices (sensors, mobile devices and edge nodes). Due to which the data has a heterogeneous nature and provides different perspectives (views) about the studied phenomenon. This makes the monitoring task very challenging, requiring machine learning and data mining models that are not only able to continuously integrate and analyze multi-view streaming data, but also are capable of adapting to concept drift scenarios of newly arriving data. This study presents a multi-view clustering approach that can be applied for monitoring and analysis of streaming data scenarios. The approach allows for parallel monitoring of the individual view clustering models and mining view correlations in the integrated (global) clustering models. The global model built at each data chunk is a formal concept lattice generated by a formal context consisting of closed patterns representing the most typical correlations among the views. The proposed approach is evaluated on two different data sets. The obtained results demonstrate that it is suitable for modelling and monitoring multi-view streaming phenomena by providing means for continuous analysis and pattern mining. © 2021, IFIP International Federation for Information Processing.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2021
Series
IFIP Advances in Information and Communication Technology, ISSN 18684238 ; 627
Keywords
Closed patterns, Formal concept analysis, Multi-instance learning, Multi-view clustering, Streaming data, Artificial intelligence, Data mining, Intelligent buildings, mHealth, Monitoring, Continuous analysis, Data mining models, Formal concept lattices, Machine health monitoring, Monitoring and analysis, Monitoring tasks, Smart monitoring, Cluster analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-22023 (URN)10.1007/978-3-030-79150-6_14 (DOI)2-s2.0-85111810320 (Scopus ID)9783030791490 (ISBN)
Conference
12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2021, Virtual, Online, 25 June 2021 - 27 June 2021
Funder
Knowledge Foundation, 20140032
Available from: 2021-08-20 Created: 2021-08-20 Last updated: 2024-04-09Bibliographically approved
Petersson, S., Grahn, H. & Rasmusson, J. (2021). Blind Correction of Lateral Chromatic Aberration in Raw Bayer Data. IEEE Access, 9
Open this publication in new window or tab >>Blind Correction of Lateral Chromatic Aberration in Raw Bayer Data
2021 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 9Article in journal (Refereed) Published
Abstract [en]

Chromatic aberration is an error that occurs in color images due to the fact that camera lenses refract the light of different wavelengths in different angles. The common approach today to correct the error is to use a lookup table for each camera-lens combination, e.g., as in Adobe PhotoShop Lightroom or DxO Optics Pro. In this paper, we propose a method that corrects the chromatic aberration error without any priot knowledge of the camera-lens combination, and does the correction already on the bayer data, i.e., before the Raw image data is interpolated to an RGB image. We evaluate our method in comparison to DxO Optics Pro, a state-of-the-art tool based on lookup tables, using 25 test images and the variance of the color differences (VCD) metric. The results show that our blind method has a similar error correction performance as DxO Optics Pro, but without prior knowledge of the camera-lens setup. CCBYNCND

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2021
Keywords
Blind correction, Cameras, Chromatic aberration, Color, GPGPU, Image color analysis, Image edge detection, Image enhancement, Interpolation, Lenses, Optics, Structural instability, Camera lenses, Error correction, Table lookup, Adobe Photoshop, Color difference, Correction performance, Lens combination, Prior knowledge, Raw image data, State of the art, Aberrations
National Category
Media Engineering
Identifiers
urn:nbn:se:bth-22007 (URN)10.1109/ACCESS.2021.3096201 (DOI)000675195700001 ()2-s2.0-85110877841 (Scopus ID)
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2021-08-12 Created: 2021-08-12 Last updated: 2021-09-02Bibliographically approved
García Martín, E., Lavesson, N., Grahn, H., Casalicchio, E. & Boeva, V. (2021). Energy-Aware Very Fast Decision Tree. International Journal of Data Science and Analytics, 11(2), 105-126
Open this publication in new window or tab >>Energy-Aware Very Fast Decision Tree
Show others...
2021 (English)In: International Journal of Data Science and Analytics, ISSN 2364-415X, Vol. 11, no 2, p. 105-126Article in journal (Refereed) Published
Abstract [en]

Recently machine learning researchers are designing algorithms that can run in embedded and mobile devices, which introduces additional constraints compared to traditional algorithm design approaches. One of these constraints is energy consumption, which directly translates to battery capacity for these devices. Streaming algorithms, such as the Very Fast Decision Tree (VFDT), are designed to run in such devices due to their high velocity and low memory requirements. However, they have not been designed with an energy efficiency focus. This paper addresses this challenge by presenting the nmin adaptation method, which reduces the energy consumption of the VFDT algorithm with only minor effects on accuracy. nmin adaptation allows the algorithm to grow faster in those branches where there is more confidence to create a split, and delays the split on the less confident branches. This removes unnecessary computations related to checking for splits but maintains similar levels of accuracy. We have conducted extensive experiments on 29 public datasets, showing that the VFDT with nmin adaptation consumes up to 31% less energy than the original VFDT, and up to 96% less energy than the CVFDT (VFDT adapted for concept drift scenarios), trading off up to 1.7 percent of accuracy.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2021
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-19150 (URN)10.1007/s41060-021-00246-4 (DOI)000631559600001 ()2-s2.0-85102938796 (Scopus ID)
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-01-23 Created: 2020-01-23 Last updated: 2021-07-30Bibliographically approved
Borg, A., Ahlstrand, J. & Boldt, M. (2021). Improving Corporate Support by Predicting Customer e-Mail Response Time: Experimental Evaluation and a Practical Use Case. In: Filipe J., Śmiałek M., Brodsky A., Hammoudi S. (Ed.), Enterprise Information Systems: . Paper presented at 22nd International Conference on Enterprise Information Systems, ICEIS 2020, Virtual, Online, 5 May through 7 May (pp. 100-121). Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>Improving Corporate Support by Predicting Customer e-Mail Response Time: Experimental Evaluation and a Practical Use Case
2021 (English)In: Enterprise Information Systems / [ed] Filipe J., Śmiałek M., Brodsky A., Hammoudi S., Springer Science and Business Media Deutschland GmbH , 2021, p. 100-121Conference paper, Published paper (Refereed)
Abstract [en]

Customer satisfaction is an important aspect for any corporations customer support process. One important factor keeping the time customers’ wait for a reply at acceptable levels. By utilizing learning models based on the Random Forest Algorithm, the extent to which it is possible to predict e-Mail time-to-respond is investigated. This is investigated both for customers, but also for customer support agents. The former focusing on how long until customers reply, and the latter focusing on how long until a customer receives an answer. The models are trained on a data set consisting of 51, 682 customer support e-Mails. The e-Mails covers various topics from a large telecom operator. The models are able to predict the time-to-respond for customer support agents with an AUC of 0.90, and for customers with an AUC of 0.85. These results indicate that it is possible to predict the TTR for both groups. The approach were also implemented in an initial trial in a live environment. How the predictions can be applied to improve communication efficiency, e.g. by anticipating the staff needs in customer support, is discussed in more detail in the paper. Further, insights gained from an initial implementation are provided. © 2021, Springer Nature Switzerland AG.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2021
Series
Lecture Notes in Business Information Processing, ISSN 1865-1348, E-ISSN 1865-1356 ; 417
Keywords
Decision support, e-Mail time-to-respond, machine learning, Prediction, Random forest, Decision trees, Electronic mail, Forecasting, Information systems, Information use, Sales, Communication efficiency, Customer support, Customer support process, Experimental evaluation, Learning models, Practical use, Random forest algorithm, Telecom operators, Customer satisfaction
National Category
Computer Sciences Business Administration
Identifiers
urn:nbn:se:bth-22342 (URN)10.1007/978-3-030-75418-1_6 (DOI)2-s2.0-85106400443 (Scopus ID)9783030754174 (ISBN)
Conference
22nd International Conference on Enterprise Information Systems, ICEIS 2020, Virtual, Online, 5 May through 7 May
Funder
Knowledge Foundation, 20140032
Available from: 2021-11-11 Created: 2021-11-11 Last updated: 2022-12-02Bibliographically approved
Cheddad, A. (2021). Machine Learning in Healthcare: Breast Cancer and Diabetes Cases. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): . Paper presented at AVI 2020 Workshop on Road Mapping Infrastructures for Artificial Intelligence Supporting Advanced Visual Big Data Analysis, AVI-BDA 2020 and 2nd Italian Workshop on Visualization and Visual Analytics, ITAVIS 2020, Ischia; Italy, 29 September 2020 through 29 September 2020 (pp. 125-135). Springer Science and Business Media Deutschland GmbH, 12585
Open this publication in new window or tab >>Machine Learning in Healthcare: Breast Cancer and Diabetes Cases
2021 (English)In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Science and Business Media Deutschland GmbH , 2021, Vol. 12585, p. 125-135Conference paper, Published paper (Refereed)
Abstract [en]

This paper provides insights into a workflow of different applications of machine learning coupled with image analysis in the healthcare sector which we have undertaken. As case studies, we use personalized breast cancer screenings and diabetes research (i.e., Beta-cell mass quantification in mice and diabetic retinopathy analysis). Our tools play a pivotal role in evidence-based process for personalized medicine and/or in monitoring the progression of diabetes as a chronic disease to help for better understanding of its development and the way to combat it. Although this multidisciplinary collaboration provides only succinct description of these research nodes, relevant references are furnished for further details. © 2021, Springer Nature Switzerland AG.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2021
Series
Lecture Notes in Computer Science, ISSN 03029743, E-ISSN 16113349 ; 12585
Keywords
Applied machine learning, Breast cancer, Diabetes, Medical image analysis, Data visualization, Diseases, Eye protection, Health care, Mammals, Visualization, Breast cancer screening, Chronic disease, Diabetes research, Diabetic retinopathy, Evidence-based, Healthcare sectors, Multi-disciplinary collaborations, Machine learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-21298 (URN)10.1007/978-3-030-68007-7_8 (DOI)2-s2.0-85102617923 (Scopus ID)9783030680060 (ISBN)
Conference
AVI 2020 Workshop on Road Mapping Infrastructures for Artificial Intelligence Supporting Advanced Visual Big Data Analysis, AVI-BDA 2020 and 2nd Italian Workshop on Visualization and Visual Analytics, ITAVIS 2020, Ischia; Italy, 29 September 2020 through 29 September 2020
Available from: 2021-03-26 Created: 2021-03-26 Last updated: 2022-05-06Bibliographically approved
Cheddad, A., Kusetogullari, H., Hilmkil, A., Sundin, L., Yavariabdi, A., Aouache, M. & Hall, J. (2021). SHIBR-The Swedish Historical Birth Records: a semi-annotated dataset. Neural Computing & Applications, 33(22), 15863-15875
Open this publication in new window or tab >>SHIBR-The Swedish Historical Birth Records: a semi-annotated dataset
Show others...
2021 (English)In: Neural Computing & Applications, ISSN 0941-0643, E-ISSN 1433-3058, Vol. 33, no 22, p. 15863-15875Article in journal (Refereed) Published
Abstract [en]

This paper presents a digital image dataset of historical handwritten birth records stored in the archives of several parishes across Sweden, together with the corresponding metadata that supports the evaluation of document analysis algorithms' performance. The dataset is called SHIBR (the Swedish Historical Birth Records). The contribution of this paper is twofold. First, we believe it is the first and the largest Swedish dataset of its kind provided as open access (15,000 high-resolution colour images of the era between 1800 and 1840). We also perform some data mining of the dataset to uncover some statistics and facts that might be of interest and use to genealogists. Second, we provide a comprehensive survey of contemporary datasets in the field that are open to the public along with a compact review of word spotting techniques. The word transcription file contains 17 columns of information pertaining to each image (e.g., child's first name, birth date, date of baptism, father's first/last name, mother's first/last name, death records, town, job title of the father/mother, etc.). Moreover, we evaluate some deep learning models, pre-trained on two other renowned datasets, for word spotting in SHIBR. However, our dataset proved challenging due to the unique handwriting style. Therefore, the dataset could also be used for competitions dedicated to a large set of document analysis problems, including word spotting.

Place, publisher, year, edition, pages
Springer London, 2021
Keywords
Historical data of birth recordsHandwritten documentsPublic datasetWord spotting
National Category
Public Health, Global Health, Social Medicine and Epidemiology Computer Sciences
Identifiers
urn:nbn:se:bth-22072 (URN)10.1007/s00521-021-06207-z (DOI)000667130400001 ()
Funder
Knowledge Foundation, 20140032The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), AF2020-8892
Note

open access

Available from: 2021-09-02 Created: 2021-09-02 Last updated: 2022-05-04Bibliographically approved
Sidorova, J., Karlsson, S., Rosander, O., Berthier, M. & Moreno-Torres, I. (2021). Towards disorder-independent automatic assessment of emotional competence in neurological patients with a classical emotion recognition system: application in foreign accent syndrome. IEEE Transactions on Affective Computing, 12(4), 962-973
Open this publication in new window or tab >>Towards disorder-independent automatic assessment of emotional competence in neurological patients with a classical emotion recognition system: application in foreign accent syndrome
Show others...
2021 (English)In: IEEE Transactions on Affective Computing, E-ISSN 1949-3045, Vol. 12, no 4, p. 962-973Article in journal (Refereed) Published
Abstract [en]

Emotive speech is a non-invasive and cost-effective biomarker in a wide spectrum of neurological disorders with computational systems built to automate the diagnosis. In order to explore the possibilities for the automation of a routine speech analysis in the presence of hard to learn pathology patterns, we propose a framework to assess the level of competence in paralinguistic communication. Initially, the assessment relies on a perceptual experiment completed by human listeners, and a model called the Aggregated Ear is proposed that draws a conclusion about the level of competence demonstrated by the patient. Then, the automation of the Aggregated Ear has been undertaken and resulted in a computational model that summarizes the portfolio of speech evidence on the patient. The summarizing system has a classical emotion recognition system as its central component. The code and the medical data are available from the corresponding author on request. IEEE

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2021
Keywords
biomarker, Computational modeling, computational paralinguistics, Ear, Emotion recognition, foreign accent syndrome, health care, Neurological diseases, Pathology, Portfolios, Cost effectiveness, Diagnosis, Neurology, Speech recognition, Automatic assessment, Central component, Computational model, Computational system, Neurological disorders, Neurological patient, Summarizing systems, Speech communication
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:bth-20310 (URN)10.1109/TAFFC.2019.2908365 (DOI)000722000100011 ()2-s2.0-85089297054 (Scopus ID)
Funder
Knowledge Foundation, 20140032
Note

Open access

Available from: 2020-08-25 Created: 2020-08-25 Last updated: 2023-12-05Bibliographically approved
Abghari, S., Boeva, V., Brage, J. & Grahn, H. (2020). A Higher Order Mining Approach for the Analysis of Real-World Datasets. Energies, 13(21), Article ID 5781.
Open this publication in new window or tab >>A Higher Order Mining Approach for the Analysis of Real-World Datasets
2020 (English)In: Energies, E-ISSN 1996-1073, Vol. 13, no 21, article id 5781Article in journal (Refereed) Published
Abstract [en]

In this study, we propose a higher order mining approach that can be used for the analysis of real-world datasets. The approach can be used to monitor and identify the deviating operational behaviour of the studied phenomenon in the absence of prior knowledge about the data. The proposed approach consists of several different data analysis techniques, such as sequential pattern mining, clustering analysis, consensus clustering and the minimum spanning tree (MST). Initially, a clustering analysis is performed on the extracted patterns to model the behavioural modes of the studied phenomenon for a given time interval. The generated clustering models, which correspond to every two consecutive time intervals, can further be assessed to determine changes in the monitored behaviour. In cases in which significant differences are observed, further analysis is performed by integrating the generated models into a consensus clustering and applying an MST to identify deviating behaviours. The validity and potential of the proposed approach is demonstrated on a real-world dataset originating from a network of district heating (DH) substations. The obtained results show that our approach is capable of detecting deviating and sub-optimal behaviours of DH substations.

Place, publisher, year, edition, pages
MDPI, 2020
Keywords
outlier detection, fault detection, higher order mining, clustering analysis, minimum spanning tree, data mining, district heating substations
National Category
Energy Systems
Identifiers
urn:nbn:se:bth-20453 (URN)10.3390/en13215781 (DOI)000588863900001 ()
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-09-22 Created: 2020-09-22 Last updated: 2023-08-28Bibliographically approved
Principal InvestigatorGrahn, Håkan
Coordinating organisation
Blekinge Institute of Technology
Funder
Period
2014-09-01 - 2020-12-31
National Category
Computer Sciences
Identifiers
DiVA, id: project:2351Project, id: 20140032

Search in DiVA

Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

Link to external project page

Bigdata@BTH