Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Domain Adaptation Through Cluster Integration and Correlation
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3371-5347
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3128-191x
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0002-3010-8798
2022 (English)In: IEEE International Conference on Data Mining Workshops, ICDMW / [ed] Candan K.S., Dinh T.N., Thai My.T., Washio T., IEEE Computer Society, 2022, p. 119-126Conference paper, Published paper (Refereed)
Abstract [en]

Domain shift is a common problem in many real-world applications using machine learning models. Most of the existing solutions are based on supervised and deep-learning models. This paper proposes a novel clustering algorithm capable of producing an adapted and/or integrated clustering model for the considered domains. Source and target domains are represented by clustering models such that each cluster of a domain models a specific scenario of the studied phenomenon by defining a range of allowable values for each attribute in a given data vector. The proposed domain integration algorithm works in two steps: (i) cross-labeling and (ii) integration. Initially, each clustering model is crossly applied to label the cluster representatives of the other model. These labels are used to determine the correlations between the two models to identify the common clusters for both domains, which must be integrated within the second step. Different features of the proposed algorithm are studied and evaluated on a publicly available human activity recognition (HAR) data set and real-world data from a smart logistics use case provided by an industrial partner. The experiment's goal on the HAR data set is to showcase the algorithm's potential in automatic data labeling. While the conducted experiments on the smart logistics use case evaluate and compare the performance of the integrated and two adapted models in different domains. © 2022 IEEE.

Place, publisher, year, edition, pages
IEEE Computer Society, 2022. p. 119-126
Series
IEEE International Conference on Data Mining Workshops, ICDMW, ISSN 2375-9232, E-ISSN 2375-9259 ; 2022
Keywords [en]
Cluster analysis, Clustering algorithms, Deep learning, Learning systems, Clustering model, Clustering techniques, Data set, Domain adaptation, Human activity recognition, Learning models, Machine learning models, Novel clustering, Real-world, Target domain, Data integration
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-24336DOI: 10.1109/ICDMW58026.2022.00025ISI: 000971492200017Scopus ID: 2-s2.0-85148440164ISBN: 9798350346091 (print)OAI: oai:DiVA.org:bth-24336DiVA, id: diva2:1741064
Conference
22nd IEEE International Conference on Data Mining Workshops, ICDMW 2022, Orlando, 28 November through 1 December 2022
Available from: 2023-03-03 Created: 2023-03-03 Last updated: 2024-04-09Bibliographically approved
In thesis
1. Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
Open this publication in new window or tab >>Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A large amount of data is generated from fields like IoT, smart monitoring applications, etc., raising demand for suitable data analysis and mining techniques. Data produced through such systems have many distinct characteristics, like continuous generation, evolving nature, multi-source origin, and heterogeneity, and in addition are usually not annotated. Clustering is an unsupervised learning technique used to group and analyze unlabeled data. Conventional clustering algorithms are unsuitable for dealing with data with the mentioned characteristics due to memory, computational constraints, and their inability to handle the heterogeneous and evolving nature of the data. Therefore, novel clustering approaches are needed to analyze and interpret such challenging data. 

This thesis focuses on building and studying advanced clustering algorithms that can address the main challenges of today's real-world data: evolving and heterogeneous nature. An evolving clustering approach capable of continuously updating the generated clustering solution in the presence of new data is initially proposed, which is later extended to address the challenges of multi-view data applications. Multi-view or multi-source data presents the studied phenomenon or system from different perspectives (views) and can reveal interesting knowledge that is invisible when only one view is considered and analyzed. This has motivated us to continue exploring data from different perspectives in several other studies of this thesis. Domain shift is another common problem when data is obtained from various devices or locations, leading to a drop in the performance of machine learning models if they are not adapted to the current domain (device, location, etc.). The thesis explores the domain adaptation problem in a resource-constraint way using cluster integration techniques. A new hybrid clustering technique for analyzing the heterogeneous data is also proposed. It produces homogeneous groups, facilitating continuous monitoring and fault detection.

The algorithms and techniques proposed in this thesis are evaluated on various data sets, including real-world data from industrial partners in domains like smart building systems, smart logistics, and performance monitoring of industrial assets. The obtained results demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams and/or heterogeneous data. They can adequately adapt single and multi-view clustering models by continuously integrating newly arriving data.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2024
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 2024:06
Keywords
Domain Adaptation, Evolving Clustering, Heterogeneous Data, Multi-View Clustering, Streaming Data
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:bth-26098 (URN)978-91-7295-479-3 (ISBN)
Public defence
2024-05-22, J1630, Campus Gräsvik, Karlskrona, 09:00 (English)
Opponent
Supervisors
Available from: 2024-04-10 Created: 2024-04-09 Last updated: 2024-04-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Devagiri, Vishnu ManasaBoeva, VeselkaAbghari, Shahrooz

Search in DiVA

By author/editor
Devagiri, Vishnu ManasaBoeva, VeselkaAbghari, Shahrooz
By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 139 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf