Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Split-merge evolutionary clustering for multi-view streaming data
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3371-5347
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3128-191x
EluciDATA Lab, BEL.
2020 (English)In: Procedia Computer Science / [ed] Cristani M.,Toro C.,Zanni-Merk C.,Howlett R.J.,Jain L.C.,Jain L.C., Elsevier, 2020, Vol. 176, p. 460-469Conference paper, Published paper (Refereed)
Abstract [en]

In this study, we propose a new multi-view stream clustering approach, called MV Split-Merge Clustering. The proposed approach is an extension of an existing split-merge evolutionary clustering algorithm (entitled Split-Merge Clustering) to multi-view data applications. The extended version can be used to integrate data from multiple views in a streaming manner and discover cluster structure for each data chunk. The MV Split-Merge Clustering can be applied for grouping distinct chunks of multi-view streaming data so that a global integrated clustering model is built on each data chunk. At each time window, an updated clustering solution (local model) is initially produced on each view of the current data chunk by applying the Split-Merge Clustering algorithm. Formal Concept Analysis is then used in order to integrate information from the multiple views (local clustering models) and generate a global model (formal concept lattice) that reveals the correlations among the clusters of the local models. The proposed MV Split-Merge Clustering has been initially evaluated on a publicly available data set. Our results show that the approach is able to identify a clustering structure and relationships among the different views comparable to those produced in a batch scenario. © 2020 The Authors. Published by Elsevier B.V.

Place, publisher, year, edition, pages
Elsevier, 2020. Vol. 176, p. 460-469
Series
Procedia Computer Science, E-ISSN 1877-0509
Keywords [en]
Clustering algorithms, Data stream mining, Evolutionary clustering, Multi-View clustering, Online learning, Cluster analysis, Evolutionary algorithms, Formal concept analysis, Information analysis, Knowledge based systems, Cluster structure, Clustering model, Clustering solutions, Extended versions, Formal concept lattices, Multi-view datum, Stream clustering
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-20632DOI: 10.1016/j.procs.2020.08.048Scopus ID: 2-s2.0-85093357055OAI: oai:DiVA.org:bth-20632DiVA, id: diva2:1486511
Conference
24th KES International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2020, Virtual Online, 16 September 2020 through 18 September 2020
Part of project
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, Knowledge Foundation
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-11-02 Created: 2020-11-02 Last updated: 2024-04-09Bibliographically approved
In thesis
1. Clustering Techniques for Mining and Analysis of Evolving Data
Open this publication in new window or tab >>Clustering Techniques for Mining and Analysis of Evolving Data
2021 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The amount of data generated is on rise due to increased demand for fields like IoT, smart monitoring applications, etc. Data generated through such systems have many distinct characteristics like continuous data generation, evolutionary, multi-source nature, and heterogeneity. In addition, the real-world data generated in these fields is largely unlabelled. Clustering is an unsupervised learning technique used to group, analyze and interpret unlabelled data. Conventional clustering algorithms are not suitable for dealing with data having previously mentioned characteristics due to memory and computational constraints, their inability to handle concept drift, distributed location of data. Therefore novel clustering approaches capable of analyzing and interpreting evolving and/or multi-source streaming data are needed. 

The thesis is focused on building evolutionary clustering algorithms for data that evolves over time. We have initially proposed an evolutionary clustering approach, entitled Split-Merge Clustering (Paper I), capable of continuously updating the generated clustering solution in the presence of new data. Through the progression of the work, new challenges have been studied and addressed. Namely, the Split-Merge Clustering algorithm has been enhanced in Paper II with new capabilities to deal with the challenges of multi-view data applications. A multi-view or multi-source data presents the studied phenomenon/system from different perspectives (views), and can reveal interesting knowledge that is not visible when only one view is considered and analyzed. This has motivated us to continue in this direction by designing two other novel multi-view data stream clustering algorithms. The algorithm proposed in Paper III improves the performance and interpretability of the algorithm proposed in Paper II. Paper IV introduces a minimum spanning tree based multi-view clustering algorithm capable of transferring knowledge between consecutive data chunks, and it is also enriched with a post-clustering pattern-labeling procedure. 

The proposed and studied evolutionary clustering algorithms are evaluated on various data sets. The obtained results have demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams. They are able to adequately adapt single and multi-view clustering models by continuously integrating newly arriving data. 

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2021
Series
Blekinge Institute of Technology Licentiate Dissertation Series, ISSN 1650-2140 ; 2021:09
Keywords
Clustering analysis, Concept drift, Evolutionary clustering, Machine learning, Streaming data
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:bth-22262 (URN)978-91-7295-432-8 (ISBN)
Presentation
2021-12-13, J1630, Blekinge Tekniska Högskola SE-371 79, Karlskrona, 13:00 (English)
Opponent
Supervisors
Available from: 2021-11-02 Created: 2021-11-01 Last updated: 2021-11-19Bibliographically approved
2. Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
Open this publication in new window or tab >>Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A large amount of data is generated from fields like IoT, smart monitoring applications, etc., raising demand for suitable data analysis and mining techniques. Data produced through such systems have many distinct characteristics, like continuous generation, evolving nature, multi-source origin, and heterogeneity, and in addition are usually not annotated. Clustering is an unsupervised learning technique used to group and analyze unlabeled data. Conventional clustering algorithms are unsuitable for dealing with data with the mentioned characteristics due to memory, computational constraints, and their inability to handle the heterogeneous and evolving nature of the data. Therefore, novel clustering approaches are needed to analyze and interpret such challenging data. 

This thesis focuses on building and studying advanced clustering algorithms that can address the main challenges of today's real-world data: evolving and heterogeneous nature. An evolving clustering approach capable of continuously updating the generated clustering solution in the presence of new data is initially proposed, which is later extended to address the challenges of multi-view data applications. Multi-view or multi-source data presents the studied phenomenon or system from different perspectives (views) and can reveal interesting knowledge that is invisible when only one view is considered and analyzed. This has motivated us to continue exploring data from different perspectives in several other studies of this thesis. Domain shift is another common problem when data is obtained from various devices or locations, leading to a drop in the performance of machine learning models if they are not adapted to the current domain (device, location, etc.). The thesis explores the domain adaptation problem in a resource-constraint way using cluster integration techniques. A new hybrid clustering technique for analyzing the heterogeneous data is also proposed. It produces homogeneous groups, facilitating continuous monitoring and fault detection.

The algorithms and techniques proposed in this thesis are evaluated on various data sets, including real-world data from industrial partners in domains like smart building systems, smart logistics, and performance monitoring of industrial assets. The obtained results demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams and/or heterogeneous data. They can adequately adapt single and multi-view clustering models by continuously integrating newly arriving data.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2024
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 2024:06
Keywords
Domain Adaptation, Evolving Clustering, Heterogeneous Data, Multi-View Clustering, Streaming Data
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:bth-26098 (URN)978-91-7295-479-3 (ISBN)
Public defence
2024-05-22, J1630, Campus Gräsvik, Karlskrona, 09:00 (English)
Opponent
Supervisors
Available from: 2024-04-10 Created: 2024-04-09 Last updated: 2024-04-22Bibliographically approved

Open Access in DiVA

fulltext(512 kB)521 downloads
File information
File name FULLTEXT01.pdfFile size 512 kBChecksum SHA-512
98a5498edc3459cc75d2eca53a7fbcef24f4665743e6a3a1922ad37724b0a090b074eb423dff8a1ffb896e27e1abc2115830222695922f7aa1799f97f099c2cc
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Devagiri, Vishnu ManasaBoeva, Veselka

Search in DiVA

By author/editor
Devagiri, Vishnu ManasaBoeva, Veselka
By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 521 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 197 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf