Unsupervised Machine Learning: An Investigation of Clustering Algorithms on a Small Dataset
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
Context: With the rising popularity of machine learning, looking at its shortcomings is valuable in seeing how well machine learning is applicable. Is it possible to apply the clustering with a small dataset?
Objectives: This thesis consists of a literature study, a survey and an experiment. It investigates how two different unsupervised machine learning algorithms DBSCAN(Density-Based Spatial Clustering of Applications with Noise) and K-means run on a dataset gathered from a survey.
Methods: Making a survey where we can see statistically what most people chose and apply clustering with the data from the survey to confirm if the clustering has the same patterns as what people have picked statistically.
Results: It was possible to identify patterns with clustering algorithms using a small dataset. The literature studies show examples that both algorithms have been used successfully.
Conclusions: It's possible to see patterns using DBSCAN and K-means on a small dataset. The size of the dataset is not necessarily the only aspect to take into consideration, feature and parameter selection are both important as well since the algorithms need to be tuned and customized to the data.
Place, publisher, year, edition, pages
2018. , p. 39
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:bth-16300OAI: oai:DiVA.org:bth-16300DiVA, id: diva2:1213516
Subject / course
PA1445 Kandidatkurs i Programvaruteknik; PA1445 Kandidatkurs i Programvaruteknik
Educational program
PAGIP International Software Engineering; PAGPT Software Engineering
Supervisors
Examiners
2018-06-052018-06-042018-06-05Bibliographically approved