Graph-based Multi-view Clustering for Continuous Pattern Mining
2021 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Background. In many smart monitoring applications, such as smart healthcare, smart building, autonomous cars etc., data are collected from multiple sources and contain information about different perspectives/views of the monitored phenomenon, physical object, system. In addition, in many of those applications the availability of relevant labelled data is often low or even non-existing. Inspired by this, in this thesis study we propose a novel algorithm for multi-view stream clustering. The algorithm can be applied for continuous pattern mining and labeling of streaming data.
Objectives. The main objective of this thesis is to develop and implement a novel multi-view stream clustering algorithm. In addition, the potential of the proposed algorithm is studied and evaluated on two datasets: synthetic and real-world. The conducted experiments study the new algorithm’s performance compared to a single-view clustering algorithm and an algorithm without transferring knowledge between chunks. Finally, the obtained results are analyzed, discussed and interpreted.
Methods. Initially, we study the state-of-the-art multi-view (stream) clustering algorithms. Then we develop our multi-view clustering algorithm for streaming data by implementing transfer of knowledge feature. We present and explain in details the developed algorithm by motivating each choice made during the algorithm design phase. Finally, discussion of the algorithm configuration, experimental setup and the datasets chosen for the experiments are presented and motivated.
Results. Different configurations of the proposed algorithm have been studied and evaluated under different experimental scenarios on two different datasets: synthetic and real-world. The proposed multi-view clustering algorithm has demonstrated higher performance on the synthetic data than on the real-world dataset. This is mainly due to not very good quality of the used real-world data.
Conclusions. The proposed algorithm has demonstrated higher performance results on the synthetic dataset than on the real-world dataset. It can generate high-quality clustering solutions with respect to the used evaluation metrics. In addition, the transfer of knowledge feature has been shown to have a positive effect on the algorithm performance. A further study of the proposed algorithm on other richer and more suitable datasets, e.g., data collected from numerous sensors used for monitoring some phenomenon, is planned to be conducted in the future work.
Place, publisher, year, edition, pages
2021. , p. 48
Keywords [en]
Machine Learning, Unsupervised Learning, Multi-view Clustering, Data Stream Mining, Pattern Mining
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-21850OAI: oai:DiVA.org:bth-21850DiVA, id: diva2:1574167
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVACO Master's program in computer science 120,0 hp
Supervisors
Examiners
2021-06-302021-06-282021-07-01Bibliographically approved