Auto Grouping Network Audio Devices: Using Deep Learning
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Background: In the realm of modern audio processing, grouping network audio devices based on room acoustics is a challenging problem. Deep learning techniques provide a potential approach for capturing complex audio patterns and effectively grouping devices. This study focuses on feature extraction and model evaluation to determine the optimal combinations for device grouping tasks.
Objectives: The primary objectives of this research are to identify the best sound features Mel-Frequency Cepstral Coefficients (MFCCs), Mel-spectrogram, and spectrogram, and to compare the performance of three deep learning models, CNN,CNN-BiLSTM, and MobileNetV3, applied to audio recordings across various room configurations. This study aims to increase the accuracy of automatic device grouping using similarities in recorded audio data.
Methods: A custom dataset was developed by collecting audio samples from multiple devices in various room configurations. These extracted sound features are inputs to train and validate the deep learning model. The architectures of these models areadjusted, and hyperparameter tuning is incorporated to improve performance. Subsequently, each model was trained and evaluated on these three feature sets.
Results: The CNN-BiLSTM model outperformed the other models based on MFCC in device grouping accuracy. CNN models performed well with MFCC features but struggled with Mel-spectrograms and spectrograms. MobileNetV3, while being a pre-trained model, displayed comparable results but required fine-tuning to adapt tothis task.
Conclusions: The findings of this research stated that MFCCs were the most effective feature for grouping devices, with CNN-BiLSTM emerging as the best model for the task. These results highlight the significance of selecting appropriate sound features and model architectures for analyzing room acoustics and network audio device configurations.
Place, publisher, year, edition, pages
2024. , p. 67
Keywords [en]
Auto Grouping, Deep learning, Sound Features, Audio classification, Network devices
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-27146OAI: oai:DiVA.org:bth-27146DiVA, id: diva2:1915926
External cooperation
Axis Communications AB
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVADA Master Qualification Plan in Computer Science
Supervisors
Examiners
2024-12-032024-11-252025-09-30Bibliographically approved