System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Putting Sense into Incomplete Heterogeneous Data with Hypergraph Clustering Analysis
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3371-5347
Sirris, EluciDATA Lab, Belgium.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.ORCID iD: 0000-0003-3128-191x
Sirris, EluciDATA Lab, Belgium.
2024 (English)In: Advances in Intelligent Data Analysis XXII, PT II, IDA 2024 / [ed] Ioanna Miliou, Nico Piatkowski, Panagiotis Papapetrou, Springer Science+Business Media B.V., 2024, p. 119-130Conference paper, Published paper (Refereed)
Abstract [en]

Many industrial scenarios are concerned with the exploration of high-dimensional heterogeneous data sets originating from diverse sources and often incomplete, i.e., containing a substantial amount of missing values. This paper proposes a novel unsupervised method that efficiently facilitates the exploration and analysis of such data sets. The methodology combines in an exploratory workflow multi-layer data analysis with shared nearest neighbor similarity and hypergraph clustering. It produces overlapping homogeneous clusters, i.e., assuming that the assets within each cluster exhibit comparable behavior. The latter can be used for computing relevant KPIs per cluster for the purpose of performance analysis and comparison. More concretely, such KPIs have the potential to aid domain experts in monitoring and understanding asset performance and, subsequently, enable the identification of outliers and the timely detection of performance degradation.

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2024. p. 119-130
Series
Lecture Notes in Computer Science, ISSN 03029743, E-ISSN 16113349 ; 14642
Keywords [en]
Clustering, Heterogeneous data, Missing values, Hypergraph, Shared nearest neighbor similarity
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-26089DOI: 10.1007/978-3-031-58553-1_10ISI: 001295920900010Scopus ID: 2-s2.0-85192191384ISBN: 9783031585555 (print)OAI: oai:DiVA.org:bth-26089DiVA, id: diva2:1850079
Conference
22nd International Symposium on Intelligent Data Analysis (IDA), Stockholm, Apr 24-26, 2024
Part of project
HINTS - Human-Centered Intelligent Realities
Funder
Knowledge Foundation, 20220068Available from: 2024-04-09 Created: 2024-04-09 Last updated: 2024-12-03Bibliographically approved
In thesis
1. Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
Open this publication in new window or tab >>Mining Evolving and Heterogeneous Data: Cluster-based Analysis Techniques
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A large amount of data is generated from fields like IoT, smart monitoring applications, etc., raising demand for suitable data analysis and mining techniques. Data produced through such systems have many distinct characteristics, like continuous generation, evolving nature, multi-source origin, and heterogeneity, and in addition are usually not annotated. Clustering is an unsupervised learning technique used to group and analyze unlabeled data. Conventional clustering algorithms are unsuitable for dealing with data with the mentioned characteristics due to memory, computational constraints, and their inability to handle the heterogeneous and evolving nature of the data. Therefore, novel clustering approaches are needed to analyze and interpret such challenging data. 

This thesis focuses on building and studying advanced clustering algorithms that can address the main challenges of today's real-world data: evolving and heterogeneous nature. An evolving clustering approach capable of continuously updating the generated clustering solution in the presence of new data is initially proposed, which is later extended to address the challenges of multi-view data applications. Multi-view or multi-source data presents the studied phenomenon or system from different perspectives (views) and can reveal interesting knowledge that is invisible when only one view is considered and analyzed. This has motivated us to continue exploring data from different perspectives in several other studies of this thesis. Domain shift is another common problem when data is obtained from various devices or locations, leading to a drop in the performance of machine learning models if they are not adapted to the current domain (device, location, etc.). The thesis explores the domain adaptation problem in a resource-constraint way using cluster integration techniques. A new hybrid clustering technique for analyzing the heterogeneous data is also proposed. It produces homogeneous groups, facilitating continuous monitoring and fault detection.

The algorithms and techniques proposed in this thesis are evaluated on various data sets, including real-world data from industrial partners in domains like smart building systems, smart logistics, and performance monitoring of industrial assets. The obtained results demonstrated the robustness of the algorithms for modeling, analyzing, and mining evolving data streams and/or heterogeneous data. They can adequately adapt single and multi-view clustering models by continuously integrating newly arriving data.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Tekniska Högskola, 2024
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 2024:06
Keywords
Domain Adaptation, Evolving Clustering, Heterogeneous Data, Multi-View Clustering, Streaming Data
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:bth-26098 (URN)978-91-7295-479-3 (ISBN)
Public defence
2024-05-22, J1630, Campus Gräsvik, Karlskrona, 09:00 (English)
Opponent
Supervisors
Available from: 2024-04-10 Created: 2024-04-09 Last updated: 2024-04-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Devagiri, Vishnu ManasaBoeva, Veselka

Search in DiVA

By author/editor
Devagiri, Vishnu ManasaBoeva, Veselka
By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 245 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf