Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparison of Supervised LearningModels for predicting prices of UsedCars
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Background: There has been a consistent increase in the used cars industry from the past decade as there is an increase in the usage of cars. Usedcars are attracting more attention as they are affordable than new ones.This situation demands high-performance algorithms that can be used topredict prices for the used cars. Many machine learning algorithms are usedto predict the price of cars.

Objectives: This thesis aims in detecting features that impact predicting the price of used cars, and experiments are performed to investigatean optimal algorithm for price prediction of used cars. Algorithms selectedfor experimenting are Linear Regression (LR), Light Gradient Boosted Machine (LGBM), Random Forest Regression (RFR), Decision Tree Regression(DTR). These algorithms are further compared using performance metricsof regression models.

Methods: The initial step in this study is to gather a suitable dataset andapplying preprocessing techniques to that data. Feature selection is performed using a correlation matrix with the heat map. Label Encoding isperformed on the data to change the categorical values into numerical values. A new dataset is created based on the feature "region" from the originaldataset. train-test-split technique is used to divide the original dataset intotrain and test data in the ratio of 80:20. The new dataset is manually divided into unique regions of train and test data. Selected Machine Learningalgorithms were trained using both datasets. The accuracy score of selectedalgorithms is derived using performance metrics. An optimal algorithm isachieved by comparing the accuracy scores derived.

Results: Light Gradient Boosted Machine is considered as optimal algorithm based on R2score, for the original dataset, it obtained 91.12% on testdata. Light Gradient Boosted Machine achieved 85.30% on test data for thenew dataset. The feature "region" has the highest feature importance overthe remaining features. It has a feature importance of 55220 with respectto number of instances i.e, 568654.

Conclusions: Among selected algorithms, Light Gradient Boosted Machine obtained a high R2score over other algorithms on both original andnew datasets. Feature "region" has a significant impact on predicting theprice of the used car, and this is justified by performing feature importanceon Light Gradient Boosted Machine.

Place, publisher, year, edition, pages
2021.
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-22321OAI: oai:DiVA.org:bth-22321DiVA, id: diva2:1609361
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Available from: 2021-11-09 Created: 2021-11-08 Last updated: 2021-11-09Bibliographically approved

Open Access in DiVA

fulltext(1343 kB)3019 downloads
File information
File name FULLTEXT02.pdfFile size 1343 kBChecksum SHA-512
d13166bddd7bcedc9efc95b3542e20a2474b41bc11409eb6c2ca44009ad05819c68c74ab05046d6a4bd5bfa2c056986336b243c96ee1768868fd740f0337e7b2
Type fulltextMimetype application/pdf

Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 3019 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 673 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf