Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Diabetes Prediction
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Background: The main cause of diabetes is due to high sugar levels in the blood. There is no permanent cure for diabetes. However, it can be prevented by early diagnosis. In recent years, the hype for Machine Learning is increasing in disease prediction especially during COVID-19 times. In the present scenario, it is difficult for patients to visit doctors. A possible framework is provided using Machine Learning which can detect diabetes at early stages.

Objectives: This thesis aims to identify the critical features that impact gestational (Type-3) diabetes and experiments are performed to identify the efficient algorithm for Type-3 diabetes prediction. The selected algorithms are Decision Trees, RandomForest, Support Vector Machine, Gaussian Naive Bayes, Bernoulli Naive Bayes, Laplacian Support Vector Machine. The algorithms are compared based on the performance.

Methods: The method consists of gathering the dataset and preprocessing the data. SelectKBestunivariate feature selection was performed for selecting the important features, which influence the Type-3 diabetes prediction. A new dataset was created by binning some of the important features from the original dataset, leading to two datasets, non-binned and binned datasets. The original dataset was imbalanced due to the unequal distribution of class labels. The train-test split was performed on both datasets. Therefore, the oversampling technique was performed on both training datasets to overcome the imbalance nature. The selected Machine Learning algorithms were trained. Predictions were made on the test data. Hyperparameter tuning was performed on all algorithms to improve the performance. Predictions were made again on the test data and accuracy, precision, recall, and f1-score were measured on both binned and non-binned datasets.

Results: Among selected Machine Learning algorithms, Laplacian Support Vector Machineattained higher performance with 89.61% and 86.93% on non-binned and binned datasets respectively. Hence, it is an efficient algorithm for Type-3 diabetes prediction. The second best algorithm is Random Forest with 74.5% and 72.72% on non-binned and binned datasets. The non-binned dataset performed well for the majority of selected algorithms.

Conclusions: Laplacian Support Vector Machine scored high performance among the other algorithms on both binned and non-binned datasets. The non-binned dataset showed the best performance in almost all Machine Learning algorithms except Bernoulli naive Bayes. Therefore, the non-binned dataset is more suitable for the Type-3 diabetes prediction.

Place, publisher, year, edition, pages
2021. , p. 36
Keywords [en]
Machine Learning, Semi-supervised Learning, Supervised Learning, Diabetes Prediction
National Category
Engineering and Technology Computer Sciences
Identifiers
URN: urn:nbn:se:bth-21816OAI: oai:DiVA.org:bth-21816DiVA, id: diva2:1573365
Subject / course
DV1478 Bachelor Thesis in Computer Science
Educational program
DVGDT Bachelor Qualification Plan in Computer Science 60.0 hp
Presentation
2021-05-25, Zoom (Online), Karlskrona, 08:00 (English)
Supervisors
Examiners
Available from: 2021-06-28 Created: 2021-06-24 Last updated: 2021-06-28Bibliographically approved

Open Access in DiVA

A Comparison on Supervised and Semi-Supervised Machine Learning Classifiers for Diabetes Prediction(484 kB)672 downloads
File information
File name FULLTEXT02.pdfFile size 484 kBChecksum SHA-512
d633b842f303083afaa0f2af5646d2d3fb1352677a82800a360e729fbda0c9e27320d68380c082ae7792a97335418bb9ff2f7496004e0a67fa9264874c089551
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Engineering and TechnologyComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 672 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1606 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf