On Evaluation of Data Stream Clustering Algorithms: A Survey
2025 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 13, p. 139524-139546
Article, review/survey (Refereed) Published
Abstract [en]
Data stream mining is a research area that has grown enormously in recent years. The main challenge is extracting knowledge in real-time from a possibly unbounded data stream. Clustering, a process in which groupings within the data are identified, is a valuable technique to extract and identify underlying structures of the data. An open question in stream clustering is how to evaluate the proposed algorithms. In this survey, we review the literature in the domain to identify common methodologies, datasets, and evaluation measures, used to evaluate the algorithms. We provide a short summary of the stream clustering algorithms in the literature, but our primary focus lies in the survey of cluster validation relevant to the evaluation of data stream clustering algorithms. We begin our literature review with the inception of clustering incrementally, namely with the introduction of the balanced iterative reducing and clustering using hierarchies (BIRCH) algorithm. We identify that the evaluation methodologies primarily focus on performance, and that aspects such as cluster quality are rarely considered. Performance has been the focal point of all evaluations, both in terms of computational performance and accuracy, since the inception of clustering data streams. We also identify that issues in the conventional clustering domain are present in the data stream clustering. However, minor additions to the evaluation methods can improve both the applicability and usefulness of the algorithms.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025. Vol. 13, p. 139524-139546
Keywords [en]
Cluster analysis, cluster validation indices, cluster validation measures, clustering, data stream clustering, data stream mining, data streams, evaluation, review, streaming data, Clustering algorithms, Data mining, Iterative methods, Quality control, Cluster validation, Cluster validation index, Cluster validation measure, Clusterings, Data stream, Data streams mining, Validation index, Reviews
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-28542DOI: 10.1109/ACCESS.2025.3596435ISI: 001550799400033Scopus ID: 2-s2.0-105013054673OAI: oai:DiVA.org:bth-28542DiVA, id: diva2:1993712
Part of project
HINTS - Human-Centered Intelligent Realities
Funder
Knowledge Foundation, 202200682025-09-012025-09-012025-09-30Bibliographically approved