Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3371-5347
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A large amount of data is generated from fields like IoT, smart monitoring applications, etc., raising demand for suitable data analysis and mining techniques. Data produced through such systems have many distinct characteristics, like continuous generation, evolving nature, multi-source origin, and heterogeneity, and in addition are usually not annotated. Clustering is an unsupervised learning technique used to group and analyze unlabeled data. Conventional clustering algorithms are unsuitable for dealing with data with the mentioned characteristics due to memory, computational constraints, and their inability to handle the heterogeneous and evolving nature of the data. Therefore, novel clustering approaches are needed to analyze and interpret such challenging data. 

This thesis focuses on building and studying advanced clustering algorithms that can address the main challenges of today's real-world data: evolving and heterogeneous nature. An evolving clustering approach capable of continuously updating the generated clustering solution in the presence of new data is initially proposed, which is later extended to address the challenges of multi-view data applications. Multi-view or multi-source data presents the studied phenomenon or system from different perspectives (views) and can reveal interesting knowledge that is invisible when only one view is considered and analyzed. This has motivated us to continue exploring data from different perspectives in several other studies of this thesis. Domain shift is another common problem when data is obtained from various devices or locations, leading to a drop in the performance of machine learning models if they are not adapted to the current domain (device, location, etc.). The thesis explores the domain adaptation problem in a resource-constraint way using cluster integration techniques. A new hybrid clustering technique for analyzing the heterogeneous data is also proposed. It produces homogeneous groups, facilitating continuous monitoring and fault detection.

The algorithms and techniques proposed in this thesis are evaluated on various data sets, including real-world data from industrial partners in domains like smart building systems, smart logistics, and performance monitoring of industrial assets. The obtained results demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams and/or heterogeneous data. They can adequately adapt single and multi-view clustering models by continuously integrating newly arriving data.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2024.
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 2024:06
Keywords [en]
Domain Adaptation, Evolving Clustering, Heterogeneous Data, Multi-View Clustering, Streaming Data
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:bth-26098ISBN: 978-91-7295-479-3 (print)OAI: oai:DiVA.org:bth-26098DiVA, id: diva2:1850201
Public defence
2024-05-22, J1630, Campus Gräsvik, Karlskrona, 09:00 (English)
Opponent
Supervisors
Available from: 2024-04-10 Created: 2024-04-09 Last updated: 2024-04-22Bibliographically approved
List of papers
1. Bipartite Split-Merge Evolutionary Clustering
Open this publication in new window or tab >>Bipartite Split-Merge Evolutionary Clustering
Show others...
2019 (English)In: Lect. Notes Comput. Sci., Springer , 2019, p. 204-223Conference paper, Published paper (Refereed)
Abstract [en]

We propose a split-merge framework for evolutionary clustering. The proposed clustering technique, entitled Split-Merge Evolutionary Clustering is supposed to be more robust to concept drift scenarios by providing the flexibility to consider at each step a portion of the data and derive clusters from it to be used subsequently to update the existing clustering solution. The proposed framework is built around the idea to model two clustering solutions as a bipartite graph, which guides the update of the existing clustering solution by merging some clusters with ones from the newly constructed clustering while others are transformed by splitting their elements among several new clusters. We have evaluated and compared the discussed evolutionary clustering technique with two other state of the art algorithms: a bipartite correlation clustering (PivotBiCluster) and an incremental evolving clustering (Dynamic split-and-merge). © Springer Nature Switzerland AG 2019.

Place, publisher, year, edition, pages
Springer, 2019
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349
Keywords
Bipartite clustering, Data mining, Dynamic clustering, Evolutionary clustering, Split-merge framework, Unsupervised learning, Artificial intelligence, Bipartite correlation clustering, Clustering solutions, Clustering techniques, State-of-the-art algorithms, Cluster analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-19127 (URN)10.1007/978-3-030-37494-5_11 (DOI)000722592200011 ()2-s2.0-85077496461 (Scopus ID)9783030374938 (ISBN)
Conference
11th International Conference on Agents and Artificial Intelligence, ICAART; Prague; Czech Republic; 19 February 2019 through 21 February
Available from: 2020-01-23 Created: 2020-01-23 Last updated: 2024-04-09Bibliographically approved
2. Split-merge evolutionary clustering for multi-view streaming data
Open this publication in new window or tab >>Split-merge evolutionary clustering for multi-view streaming data
2020 (English)In: Procedia Computer Science / [ed] Cristani M.,Toro C.,Zanni-Merk C.,Howlett R.J.,Jain L.C.,Jain L.C., Elsevier, 2020, Vol. 176, p. 460-469Conference paper, Published paper (Refereed)
Abstract [en]

In this study, we propose a new multi-view stream clustering approach, called MV Split-Merge Clustering. The proposed approach is an extension of an existing split-merge evolutionary clustering algorithm (entitled Split-Merge Clustering) to multi-view data applications. The extended version can be used to integrate data from multiple views in a streaming manner and discover cluster structure for each data chunk. The MV Split-Merge Clustering can be applied for grouping distinct chunks of multi-view streaming data so that a global integrated clustering model is built on each data chunk. At each time window, an updated clustering solution (local model) is initially produced on each view of the current data chunk by applying the Split-Merge Clustering algorithm. Formal Concept Analysis is then used in order to integrate information from the multiple views (local clustering models) and generate a global model (formal concept lattice) that reveals the correlations among the clusters of the local models. The proposed MV Split-Merge Clustering has been initially evaluated on a publicly available data set. Our results show that the approach is able to identify a clustering structure and relationships among the different views comparable to those produced in a batch scenario. © 2020 The Authors. Published by Elsevier B.V.

Place, publisher, year, edition, pages
Elsevier, 2020
Series
Procedia Computer Science, E-ISSN 1877-0509
Keywords
Clustering algorithms, Data stream mining, Evolutionary clustering, Multi-View clustering, Online learning, Cluster analysis, Evolutionary algorithms, Formal concept analysis, Information analysis, Knowledge based systems, Cluster structure, Clustering model, Clustering solutions, Extended versions, Formal concept lattices, Multi-view datum, Stream clustering
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-20632 (URN)10.1016/j.procs.2020.08.048 (DOI)2-s2.0-85093357055 (Scopus ID)
Conference
24th KES International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES 2020, Virtual Online, 16 September 2020 through 18 September 2020
Funder
Knowledge Foundation, 20140032
Note

open access

Available from: 2020-11-02 Created: 2020-11-02 Last updated: 2024-04-09Bibliographically approved
3. A Multi-view Clustering Approach for Analysis of Streaming Data
Open this publication in new window or tab >>A Multi-view Clustering Approach for Analysis of Streaming Data
2021 (English)In: IFIP Advances in Information and Communication Technology / [ed] Maglogiannis I., Macintyre J., Iliadis L., Springer Science and Business Media Deutschland GmbH , 2021, p. 169-183Conference paper, Published paper (Refereed)
Abstract [en]

Data available today in smart monitoring applications such as smart buildings, machine health monitoring, smart healthcare, etc., is not centralized and usually supplied by a number of different devices (sensors, mobile devices and edge nodes). Due to which the data has a heterogeneous nature and provides different perspectives (views) about the studied phenomenon. This makes the monitoring task very challenging, requiring machine learning and data mining models that are not only able to continuously integrate and analyze multi-view streaming data, but also are capable of adapting to concept drift scenarios of newly arriving data. This study presents a multi-view clustering approach that can be applied for monitoring and analysis of streaming data scenarios. The approach allows for parallel monitoring of the individual view clustering models and mining view correlations in the integrated (global) clustering models. The global model built at each data chunk is a formal concept lattice generated by a formal context consisting of closed patterns representing the most typical correlations among the views. The proposed approach is evaluated on two different data sets. The obtained results demonstrate that it is suitable for modelling and monitoring multi-view streaming phenomena by providing means for continuous analysis and pattern mining. © 2021, IFIP International Federation for Information Processing.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2021
Series
IFIP Advances in Information and Communication Technology, ISSN 18684238 ; 627
Keywords
Closed patterns, Formal concept analysis, Multi-instance learning, Multi-view clustering, Streaming data, Artificial intelligence, Data mining, Intelligent buildings, mHealth, Monitoring, Continuous analysis, Data mining models, Formal concept lattices, Machine health monitoring, Monitoring and analysis, Monitoring tasks, Smart monitoring, Cluster analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-22023 (URN)10.1007/978-3-030-79150-6_14 (DOI)2-s2.0-85111810320 (Scopus ID)9783030791490 (ISBN)
Conference
12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2021, Virtual, Online, 25 June 2021 - 27 June 2021
Funder
Knowledge Foundation, 20140032
Available from: 2021-08-20 Created: 2021-08-20 Last updated: 2024-04-09Bibliographically approved
4. Multi-view data analysis techniques for monitoring smart building systems
Open this publication in new window or tab >>Multi-view data analysis techniques for monitoring smart building systems
Show others...
2021 (English)In: Sensors, E-ISSN 1424-8220, Vol. 21, no 20, article id 6775Article in journal (Refereed) Published
Abstract [en]

In smart buildings, many different systems work in coordination to accomplish their tasks. In this process, the sensors associated with these systems collect large amounts of data generated in a streaming fashion, which is prone to concept drift. Such data are heterogeneous due to the wide range of sensors collecting information about different characteristics of the monitored systems. All these make the monitoring task very challenging. Traditional clustering algorithms are not well equipped to address the mentioned challenges. In this work, we study the use of MV Multi-Instance Clustering algorithm for multi-view analysis and mining of smart building systems’ sensor data. It is demonstrated how this algorithm can be used to perform contextual as well as integrated analysis of the systems. Various scenarios in which the algorithm can be used to analyze the data generated by the systems of a smart building are examined and discussed in this study. In addition, it is also shown how the extracted knowledge can be visualized to detect trends in the systems’ behavior and how it can aid domain experts in the systems’ maintenance. In the experiments conducted, the proposed approach was able to successfully detect the deviating behaviors known to have previously occurred and was also able to identify some new deviations during the monitored period. Based on the results obtained from the experiments, it can be concluded that the proposed algorithm has the ability to be used for monitoring, analysis, and detecting deviating behaviors of the systems in a smart building domain. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Place, publisher, year, edition, pages
MDPI, 2021
Keywords
Closed patterns, Evolutionary clustering, Formal concept analysis, Multi-instance learning, Multi-view clustering, Smart buildings, Streaming data, Buildings, Clustering algorithms, Building systems, Closed pattern, Concept drifts, Data analysis techniques, Large amounts of data, Multi-views
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-22225 (URN)10.3390/s21206775 (DOI)000716120000001 ()2-s2.0-85116801515 (Scopus ID)
Note

open access

Available from: 2021-10-22 Created: 2021-10-22 Last updated: 2024-04-09Bibliographically approved
5. A Graph-based Multi-view Clustering Approach for Continuous Pattern Mining
Open this publication in new window or tab >>A Graph-based Multi-view Clustering Approach for Continuous Pattern Mining
2022 (English)In: Recent Advancements in Multi-View Data Analytics / [ed] Witold Pedrycz and Shyi-Ming Chen, Springer Science+Business Media B.V., 2022, p. 201-237Chapter in book (Refereed)
Abstract [en]

Today’s smart monitoring applications need machine learning models and data mining algorithms that are capable of analysing and mining the temporal component of data streams. These models and algorithms also ought to take into account the multi-source nature of the sensor data by being able to conduct multi-view analysis. In this study, we address these challenges by introducing a novel multi-view data stream clustering approach, entitled MST-MVS clustering, that can be applied in different smart monitoring applications for continuous pattern mining and data labelling. This proposed approach is based on the Minimum Spanning Tree (MST) clustering algorithm. This algorithm is applied for parallel building of local clustering models on different views in each chunk of data. The MST-MVS clustering transfers knowledge learnt in the current data chunk to the next chunk in the form of artificial nodes used by the MST clustering algorithm. These artificial nodes are identified by analyzing multi-view patterns extracted at each data chunk in the form of an integrated (global) clustering model. We further show how the extracted patterns can be used for post-labelling of the chunk’s data by introducing a dedicated labelling technique, entitled Pattern-labelling. We study and evaluate the MST-MVS clustering algorithm under different experimental scenarios on synthetic and real-world data. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2022
Series
Studies in Big Data, ISSN 2197-6503, E-ISSN 2197-6511 ; 106
Keywords
data stream, clustering analysis, pattern mining, minimum spanning tree
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-22261 (URN)10.1007/978-3-030-95239-6_8 (DOI)2-s2.0-85130970889 (Scopus ID)978-3-030-95239-6 (ISBN)
Available from: 2021-11-01 Created: 2021-11-01 Last updated: 2024-08-16Bibliographically approved
6. Domain Adaptation Through Cluster Integration and Correlation
Open this publication in new window or tab >>Domain Adaptation Through Cluster Integration and Correlation
2022 (English)In: IEEE International Conference on Data Mining Workshops, ICDMW / [ed] Candan K.S., Dinh T.N., Thai My.T., Washio T., IEEE Computer Society, 2022, p. 119-126Conference paper, Published paper (Refereed)
Abstract [en]

Domain shift is a common problem in many real-world applications using machine learning models. Most of the existing solutions are based on supervised and deep-learning models. This paper proposes a novel clustering algorithm capable of producing an adapted and/or integrated clustering model for the considered domains. Source and target domains are represented by clustering models such that each cluster of a domain models a specific scenario of the studied phenomenon by defining a range of allowable values for each attribute in a given data vector. The proposed domain integration algorithm works in two steps: (i) cross-labeling and (ii) integration. Initially, each clustering model is crossly applied to label the cluster representatives of the other model. These labels are used to determine the correlations between the two models to identify the common clusters for both domains, which must be integrated within the second step. Different features of the proposed algorithm are studied and evaluated on a publicly available human activity recognition (HAR) data set and real-world data from a smart logistics use case provided by an industrial partner. The experiment's goal on the HAR data set is to showcase the algorithm's potential in automatic data labeling. While the conducted experiments on the smart logistics use case evaluate and compare the performance of the integrated and two adapted models in different domains. © 2022 IEEE.

Place, publisher, year, edition, pages
IEEE Computer Society, 2022
Series
IEEE International Conference on Data Mining Workshops, ICDMW, ISSN 2375-9232, E-ISSN 2375-9259 ; 2022
Keywords
Cluster analysis, Clustering algorithms, Deep learning, Learning systems, Clustering model, Clustering techniques, Data set, Domain adaptation, Human activity recognition, Learning models, Machine learning models, Novel clustering, Real-world, Target domain, Data integration
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-24336 (URN)10.1109/ICDMW58026.2022.00025 (DOI)000971492200017 ()2-s2.0-85148440164 (Scopus ID)9798350346091 (ISBN)
Conference
22nd IEEE International Conference on Data Mining Workshops, ICDMW 2022, Orlando, 28 November through 1 December 2022
Available from: 2023-03-03 Created: 2023-03-03 Last updated: 2024-04-09Bibliographically approved
7. A Domain Adaptation Technique through Cluster Boundary Integration
Open this publication in new window or tab >>A Domain Adaptation Technique through Cluster Boundary Integration
2025 (English)In: Evolving Systems, ISSN 1868-6478, E-ISSN 1868-6486, Vol. 16, no 1, article id 14Article in journal (Refereed) Published
Abstract [en]

Many machine learning models deployed on smart or edge devices experience a phase where there is a drop in their performance due to the arrival of data from new domains. This paper proposes a novel unsupervised domain adaptation algorithm called DIBCA++ to deal with such situations. The algorithm uses only the clusters’ mean, standard deviation, and size, which makes the proposed algorithm modest in terms of the required storage and computation. The study also presents the explainability aspect of the algorithm. DIBCA++ is compared with its predecessor, DIBCA, and its applicability and performance are studied and evaluated in two real-world scenarios. One is coping with the Global Navigation Satellite System activation problem from the smart logistics domain, while the other identifies different activities a person performs and deals with a human activity recognition task. Both scenarios involve time series data phenomena, i.e., DIBCA++ also contributes towards addressing the current gap regarding domain adaptation solutions for time series data. Based on the experimental results, DIBCA++ has improved performance compared to DIBCA. The DIBCA++ has performed better in all human activity recognition task experiments and 82.5% of experimental scenarios on the smart logistics use case. The results also showcase the need and benefit of personalizing the models using DIBCA++, along with the ability to transfer new knowledge between domains, leading to improved performance. The adapted source and target models have performed better in 70% and 80% of cases in an experimental scenario conducted on smart logistics. 

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Cluster integration, Clustering techniques, Domain adaptation
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-26090 (URN)10.1007/s12530-024-09635-z (DOI)001363397000001 ()2-s2.0-85210317128 (Scopus ID)
Funder
Knowledge Foundation, 20220068
Available from: 2024-04-09 Created: 2024-04-09 Last updated: 2024-12-10Bibliographically approved
8. Putting Sense into Incomplete Heterogeneous Data with Hypergraph Clustering Analysis
Open this publication in new window or tab >>Putting Sense into Incomplete Heterogeneous Data with Hypergraph Clustering Analysis
2024 (English)In: Advances in Intelligent Data Analysis XXII, PT II, IDA 2024 / [ed] Ioanna Miliou, Nico Piatkowski, Panagiotis Papapetrou, Springer Science+Business Media B.V., 2024, p. 119-130Conference paper, Published paper (Refereed)
Abstract [en]

Many industrial scenarios are concerned with the exploration of high-dimensional heterogeneous data sets originating from diverse sources and often incomplete, i.e., containing a substantial amount of missing values. This paper proposes a novel unsupervised method that efficiently facilitates the exploration and analysis of such data sets. The methodology combines in an exploratory workflow multi-layer data analysis with shared nearest neighbor similarity and hypergraph clustering. It produces overlapping homogeneous clusters, i.e., assuming that the assets within each cluster exhibit comparable behavior. The latter can be used for computing relevant KPIs per cluster for the purpose of performance analysis and comparison. More concretely, such KPIs have the potential to aid domain experts in monitoring and understanding asset performance and, subsequently, enable the identification of outliers and the timely detection of performance degradation.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2024
Series
Lecture Notes in Computer Science, ISSN 03029743, E-ISSN 16113349 ; 14642
Keywords
Clustering, Heterogeneous data, Missing values, Hypergraph, Shared nearest neighbor similarity
National Category
Computer Sciences
Identifiers
urn:nbn:se:bth-26089 (URN)10.1007/978-3-031-58553-1_10 (DOI)001295920900010 ()2-s2.0-85192191384 (Scopus ID)9783031585555 (ISBN)
Conference
22nd International Symposium on Intelligent Data Analysis (IDA), Stockholm, Apr 24-26, 2024
Funder
Knowledge Foundation, 20220068
Available from: 2024-04-09 Created: 2024-04-09 Last updated: 2024-12-03Bibliographically approved

Open Access in DiVA

fulltext(6293 kB)440 downloads
File information
File name FULLTEXT01.pdfFile size 6293 kBChecksum SHA-512
330e61c24e9eb21efdbb94925d212864cce0b8b4c0e1c0e1d3d1a861939eec8b28fe1937c23f04eecb3bb811a1c769a23ce89d8b406ceb7cc4d5d0df46c499ff
Type fulltextMimetype application/pdf

Authority records

Devagiri, Vishnu Manasa

Search in DiVA

By author/editor
Devagiri, Vishnu Manasa
By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 440 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2038 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf