Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Clustering of Driver Data based on Driving Patterns
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Data analysis methods are important to analyze the ever-growing enormous quantity of the high dimensional data. Cluster analysis separates or partitions the data into disjoint groups such that data in the same group are similar while data between groups are dissimilar. The focus of this thesis study is to identify natural groups or clusters of drivers using the data which is based on driving style. In finding such a group of drivers, evaluation of the combinations of dimensionality reduction and clustering algorithms is done. The dimensionality reduction algorithms used in this thesis are Principal Component Analysis (PCA) and t-distributed stochastic neighbour embedding (t-SNE). The clustering algorithms such as K-means Clustering and Hierarchical Clustering are selected after performing Literature Review. In this thesis, the evaluation of PCA with K-means, PCA with Hierarchical Clustering, t-SNE with K-means and t-SNE with Hierarchical Clustering is done. The evaluation was done on the Volvo Cars’ drivers dataset based on their driving styles. The dataset is normalized first and Markov Chain of driving styles is calculated. This Markov Chain dataset is of very high dimensions and hence dimensionality reduction algorithms are applied to reduce the dimensions. The reduced dimensions dataset is used as an input to selected clustering algorithms. The combinations of algorithms are evaluated using performance metrics like Silhouette Coefficient, Calinski-Harabasz Index and DaviesBouldin Index. Based on experiment and analysis, the combination of t-SNE and K-means algorithms is found to be the best in comparison to other combinations of algorithms in terms of all performance metrics and is chosen to cluster the drivers based on their driving styles.

Place, publisher, year, edition, pages
2019.
Keywords [en]
Clustering, Driving Patterns, Markov Chain, Cars, Machine Learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-18466OAI: oai:DiVA.org:bth-18466DiVA, id: diva2:1337166
External cooperation
Volvo Cars
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVADA Master Qualification Plan in Computer Science
Presentation
2019-05-27, 16:00 (English)
Supervisors
Examiners
Available from: 2019-07-24 Created: 2019-07-11 Last updated: 2019-07-24Bibliographically approved

Open Access in DiVA

BTH2019Kabra(3620 kB)3345 downloads
File information
File name FULLTEXT02.pdfFile size 3620 kBChecksum SHA-512
8d38d846e816299cc4213ded5f3824bd3c2504793ee16a845dd7a6605cf3b30212a217301e883a24a295aeb2e50ce5eecdb670ecdc19ad7fbca92b2af970ae5f
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3345 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1024 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf