Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A Graph-based Multi-view Clustering Approach for Continuous Pattern Mining
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datavetenskap.ORCID-id: 0000-0002-0476-4177
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datavetenskap.ORCID-id: 0000-0003-3371-5347
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datavetenskap.ORCID-id: 0000-0003-3128-191x
2022 (Engelska)Ingår i: Recent Advancements in Multi-View Data Analytics / [ed] Witold Pedrycz and Shyi-Ming Chen, Springer Science+Business Media B.V., 2022, s. 201-237Kapitel i bok, del av antologi (Refereegranskat)
Abstract [en]

Today’s smart monitoring applications need machine learning models and data mining algorithms that are capable of analysing and mining the temporal component of data streams. These models and algorithms also ought to take into account the multi-source nature of the sensor data by being able to conduct multi-view analysis. In this study, we address these challenges by introducing a novel multi-view data stream clustering approach, entitled MST-MVS clustering, that can be applied in different smart monitoring applications for continuous pattern mining and data labelling. This proposed approach is based on the Minimum Spanning Tree (MST) clustering algorithm. This algorithm is applied for parallel building of local clustering models on different views in each chunk of data. The MST-MVS clustering transfers knowledge learnt in the current data chunk to the next chunk in the form of artificial nodes used by the MST clustering algorithm. These artificial nodes are identified by analyzing multi-view patterns extracted at each data chunk in the form of an integrated (global) clustering model. We further show how the extracted patterns can be used for post-labelling of the chunk’s data by introducing a dedicated labelling technique, entitled Pattern-labelling. We study and evaluate the MST-MVS clustering algorithm under different experimental scenarios on synthetic and real-world data. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Ort, förlag, år, upplaga, sidor
Springer Science+Business Media B.V., 2022. s. 201-237
Serie
Studies in Big Data, ISSN 2197-6503, E-ISSN 2197-6511 ; 106
Nyckelord [en]
data stream, clustering analysis, pattern mining, minimum spanning tree
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:bth-22261DOI: 10.1007/978-3-030-95239-6_8Scopus ID: 2-s2.0-85130970889ISBN: 978-3-030-95239-6 (digital)OAI: oai:DiVA.org:bth-22261DiVA, id: diva2:1607662
Tillgänglig från: 2021-11-01 Skapad: 2021-11-01 Senast uppdaterad: 2025-09-30Bibliografiskt granskad
Ingår i avhandling
1. Clustering Techniques for Mining and Analysis of Evolving Data
Öppna denna publikation i ny flik eller fönster >>Clustering Techniques for Mining and Analysis of Evolving Data
2021 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The amount of data generated is on rise due to increased demand for fields like IoT, smart monitoring applications, etc. Data generated through such systems have many distinct characteristics like continuous data generation, evolutionary, multi-source nature, and heterogeneity. In addition, the real-world data generated in these fields is largely unlabelled. Clustering is an unsupervised learning technique used to group, analyze and interpret unlabelled data. Conventional clustering algorithms are not suitable for dealing with data having previously mentioned characteristics due to memory and computational constraints, their inability to handle concept drift, distributed location of data. Therefore novel clustering approaches capable of analyzing and interpreting evolving and/or multi-source streaming data are needed. 

The thesis is focused on building evolutionary clustering algorithms for data that evolves over time. We have initially proposed an evolutionary clustering approach, entitled Split-Merge Clustering (Paper I), capable of continuously updating the generated clustering solution in the presence of new data. Through the progression of the work, new challenges have been studied and addressed. Namely, the Split-Merge Clustering algorithm has been enhanced in Paper II with new capabilities to deal with the challenges of multi-view data applications. A multi-view or multi-source data presents the studied phenomenon/system from different perspectives (views), and can reveal interesting knowledge that is not visible when only one view is considered and analyzed. This has motivated us to continue in this direction by designing two other novel multi-view data stream clustering algorithms. The algorithm proposed in Paper III improves the performance and interpretability of the algorithm proposed in Paper II. Paper IV introduces a minimum spanning tree based multi-view clustering algorithm capable of transferring knowledge between consecutive data chunks, and it is also enriched with a post-clustering pattern-labeling procedure. 

The proposed and studied evolutionary clustering algorithms are evaluated on various data sets. The obtained results have demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams. They are able to adequately adapt single and multi-view clustering models by continuously integrating newly arriving data. 

Ort, förlag, år, upplaga, sidor
Karlskrona: Blekinge Tekniska Högskola, 2021
Serie
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 2021:09
Nyckelord
Clustering analysis, Concept drift, Evolutionary clustering, Machine learning, Streaming data
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:bth-22262 (URN)978-91-7295-432-8 (ISBN)
Presentation
2021-12-13, J1630, Blekinge Tekniska Högskola SE-371 79, Karlskrona, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2021-11-02 Skapad: 2021-11-01 Senast uppdaterad: 2025-09-30Bibliografiskt granskad
2. Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
Öppna denna publikation i ny flik eller fönster >>Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

A large amount of data is generated from fields like IoT, smart monitoring applications, etc., raising demand for suitable data analysis and mining techniques. Data produced through such systems have many distinct characteristics, like continuous generation, evolving nature, multi-source origin, and heterogeneity, and in addition are usually not annotated. Clustering is an unsupervised learning technique used to group and analyze unlabeled data. Conventional clustering algorithms are unsuitable for dealing with data with the mentioned characteristics due to memory, computational constraints, and their inability to handle the heterogeneous and evolving nature of the data. Therefore, novel clustering approaches are needed to analyze and interpret such challenging data. 

This thesis focuses on building and studying advanced clustering algorithms that can address the main challenges of today's real-world data: evolving and heterogeneous nature. An evolving clustering approach capable of continuously updating the generated clustering solution in the presence of new data is initially proposed, which is later extended to address the challenges of multi-view data applications. Multi-view or multi-source data presents the studied phenomenon or system from different perspectives (views) and can reveal interesting knowledge that is invisible when only one view is considered and analyzed. This has motivated us to continue exploring data from different perspectives in several other studies of this thesis. Domain shift is another common problem when data is obtained from various devices or locations, leading to a drop in the performance of machine learning models if they are not adapted to the current domain (device, location, etc.). The thesis explores the domain adaptation problem in a resource-constraint way using cluster integration techniques. A new hybrid clustering technique for analyzing the heterogeneous data is also proposed. It produces homogeneous groups, facilitating continuous monitoring and fault detection.

The algorithms and techniques proposed in this thesis are evaluated on various data sets, including real-world data from industrial partners in domains like smart building systems, smart logistics, and performance monitoring of industrial assets. The obtained results demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams and/or heterogeneous data. They can adequately adapt single and multi-view clustering models by continuously integrating newly arriving data.

Ort, förlag, år, upplaga, sidor
Karlskrona: Blekinge Tekniska Högskola, 2024
Serie
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 2024:06
Nyckelord
Domain Adaptation, Evolving Clustering, Heterogeneous Data, Multi-View Clustering, Streaming Data
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:bth-26098 (URN)978-91-7295-479-3 (ISBN)
Disputation
2024-05-22, J1630, Campus Gräsvik, Karlskrona, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-04-10 Skapad: 2024-04-09 Senast uppdaterad: 2025-09-30Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Åleskog, ChristofferDevagiri, Vishnu ManasaBoeva, Veselka

Sök vidare i DiVA

Av författaren/redaktören
Åleskog, ChristofferDevagiri, Vishnu ManasaBoeva, Veselka
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 890 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf