Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Impact of Data Quality on Federated Versus Centralized Learning
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2024 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background. When approaching modeling from a data-centric view, data quality is important. Data quality and its impact on models can be hard to measure due to the subjective nature of such metrics. One way to measure data quality quantitatively is through a utility-driven assessment of data quality. There is no research on the impact of data quality on deep-learning methods in centralized and federated learning. 

Objectives. To observe the impact of datasets of varying data quality on centralized versus federated learning and determine if the distribution of the data quality across federated clients affects that impact.

Methods. Create datasets of increasingly worse data quality based on two data quality metrics: data accuracy and data completeness. This is done by perturbing the dataset to alter the two data quality metrics. Three experiments are then conducted to fulfill the previously stated objectives.  

Results. The comparison of model test accuracy under different data quality conditions reveals that the centralized model achieves 60.3% accuracy with low data accuracy and 58.7% with low data completeness. The federated model performs better, achieving 69.3% accuracy with low data accuracy and 79.2% with low data completeness. The federated model is less affected by low data quality if the data quality is distributed evenly between clients.

Conclusions. The Federated deep-learning method displays certain attributes that make it more robust to data with low quality. Uneven distribution of data quality between clients has a more negative impact on federated learning than even distribution.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Centralized, Federated, Data quality
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:bth-26314OAI: oai:DiVA.org:bth-26314DiVA, id: diva2:1892481
External cooperation
Imagimob
Subject / course
Degree Project in Master of Science in Engineering 30,0 hp
Educational program
DVAMI Master of Science in Engineering: AI and Machine Learning 300 hp
Supervisors
Examiners
Available from: 2024-08-27 Created: 2024-08-27 Last updated: 2024-08-27Bibliographically approved

Open Access in DiVA

fulltext(1546 kB)266 downloads
File information
File name FULLTEXT02.pdfFile size 1546 kBChecksum SHA-512
6967f5a610c80dbebdef10efcc4d257cb08fe9e6d40bee1e3c629f1951a21fa86aed9339dacc975c27bbe95ca73115d39a97aaa51971971f9b7be674120caf98
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Nilsson, Gustav
By organisation
Department of Computer Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 266 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 179 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf