The Impact of Data Quality on Federated Versus Centralized Learning
2024 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Background. When approaching modeling from a data-centric view, data quality is important. Data quality and its impact on models can be hard to measure due to the subjective nature of such metrics. One way to measure data quality quantitatively is through a utility-driven assessment of data quality. There is no research on the impact of data quality on deep-learning methods in centralized and federated learning.
Objectives. To observe the impact of datasets of varying data quality on centralized versus federated learning and determine if the distribution of the data quality across federated clients affects that impact.
Methods. Create datasets of increasingly worse data quality based on two data quality metrics: data accuracy and data completeness. This is done by perturbing the dataset to alter the two data quality metrics. Three experiments are then conducted to fulfill the previously stated objectives.
Results. The comparison of model test accuracy under different data quality conditions reveals that the centralized model achieves 60.3% accuracy with low data accuracy and 58.7% with low data completeness. The federated model performs better, achieving 69.3% accuracy with low data accuracy and 79.2% with low data completeness. The federated model is less affected by low data quality if the data quality is distributed evenly between clients.
Conclusions. The Federated deep-learning method displays certain attributes that make it more robust to data with low quality. Uneven distribution of data quality between clients has a more negative impact on federated learning than even distribution.
Place, publisher, year, edition, pages
2024.
Keywords [en]
Centralized, Federated, Data quality
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:bth-26314OAI: oai:DiVA.org:bth-26314DiVA, id: diva2:1892481
External cooperation
Imagimob
Subject / course
Degree Project in Master of Science in Engineering 30,0 hp
Educational program
DVAMI Master of Science in Engineering: AI and Machine Learning 300 hp
Supervisors
Examiners
2024-08-272024-08-272024-08-27Bibliographically approved