Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Auto Grouping Network Audio Devices: Using Deep Learning
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Background: In the realm of modern audio processing, grouping network audio devices based on room acoustics is a challenging problem. Deep learning techniques provide a potential approach for capturing complex audio patterns and effectively grouping devices. This study focuses on feature extraction and model evaluation to determine the optimal combinations for device grouping tasks.

Objectives: The primary objectives of this research are to identify the best sound features Mel-Frequency Cepstral Coefficients (MFCCs), Mel-spectrogram, and spectrogram, and to compare the performance of three deep learning models, CNN,CNN-BiLSTM, and MobileNetV3, applied to audio recordings across various room configurations. This study aims to increase the accuracy of automatic device grouping using similarities in recorded audio data.

Methods: A custom dataset was developed by collecting audio samples from multiple devices in various room configurations. These extracted sound features are inputs to train and validate the deep learning model. The architectures of these models areadjusted, and hyperparameter tuning is incorporated to improve performance. Subsequently, each model was trained and evaluated on these three feature sets.

Results: The CNN-BiLSTM model outperformed the other models based on MFCC in device grouping accuracy. CNN models performed well with MFCC features but struggled with Mel-spectrograms and spectrograms. MobileNetV3, while being a pre-trained model, displayed comparable results but required fine-tuning to adapt tothis task.

Conclusions: The findings of this research stated that MFCCs were the most effective feature for grouping devices, with CNN-BiLSTM emerging as the best model for the task. These results highlight the significance of selecting appropriate sound features and model architectures for analyzing room acoustics and network audio device configurations.

Place, publisher, year, edition, pages
2024. , p. 67
Keywords [en]
Auto Grouping, Deep learning, Sound Features, Audio classification, Network devices
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:bth-27146OAI: oai:DiVA.org:bth-27146DiVA, id: diva2:1915926
External cooperation
Axis Communications AB
Subject / course
DV2572 Master´s Thesis in Computer Science
Educational program
DVADA Master Qualification Plan in Computer Science
Supervisors
Examiners
Available from: 2024-12-03 Created: 2024-11-25 Last updated: 2025-09-30Bibliographically approved

Open Access in DiVA

fulltext(2603 kB)92 downloads
File information
File name FULLTEXT01.pdfFile size 2603 kBChecksum SHA-512
ceb2df18502a44b3c171a19e298918d8b3ba8fc17480b882ca190514dbef5d2c3e4feca6a08fcfab843d7de5c90dc2c960d2ed6f87d095c2e3774b839f8f91cf
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 92 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 182 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf