Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Classifying Environmental Sounds with Image Networks
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context. Environmental Sound Recognition, unlike Speech Recognition, is an area that is still in the developing stages with respect to using Deep Learning methods. Sound can be converted into images by extracting spectrograms and the like. Object Recognition from images using deep Convolutional Neural Networks is a currently developing area holding high promise. The same technique has been studied and applied, but on image representations of sound.

Objectives. In this study, investigation is done to determine the best possible accuracy of performing a sound classification task using existing deep Convolutional Neural Networks by comparing the data pre-processing parameters. Also, a novel method of combining different features into a single image is proposed and its effect tested. Lastly, the performance of an existing network that fuses Convolutional and Recurrent Neural architectures is tested on the selected datasets.

Methods. In this, experiments were conducted to analyze the effects of data pre-processing parameters on the best possible accuracy with two CNNs. Also, experiment was also conducted to determine whether the proposed method of feature combination is beneficial or not. Finally, an experiment to test the performance of a combined network was conducted.

Results. GoogLeNet had the highest classification accuracy of 73% on 50-class dataset and 90-93% on 10-class datasets. The sampling rate and frame length values of the respective datasets which contributed to the high scores are 16kHz, 40ms and 8kHz, 50ms respectively. The proposed combination of features does not improve the classification accuracy. The fused CRNN network could not achieve high accuracy on the selected datasets.

Conclusions. It is concluded that deep networks designed for object recognition can be successfully used to classify environmental sounds and the pre-processing parameters’ values determined for achieving best accuracy. The novel method of feature combination does not significantly improve the accuracy when compared to spectrograms alone. The fused network which learns the special and temporal features from spectral images performs poorly in the classification task when compared to the convolutional network alone.

Place, publisher, year, edition, pages
2017.
Keyword [en]
Machine Learning, Environmental Sound Classification, Image Classification.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:bth-14062OAI: oai:DiVA.org:bth-14062DiVA: diva2:1086382
External cooperation
Sony Mobile Communications, Lund
Subject / course
DV2566 Master's Thesis (120 credits) in Computer Science
Educational program
DVAXA Master of Science Programme in Computer Science
Presentation
2017-01-23, 09:00 (English)
Supervisors
Examiners
Available from: 2017-04-03 Created: 2017-04-02 Last updated: 2017-04-03Bibliographically approved

Open Access in DiVA

fulltext(764 kB)183 downloads
File information
File name FULLTEXT02.pdfFile size 764 kBChecksum SHA-512
feb0b2af534b2944e58cf11320a2506f76dda5de285aafc24117c316de43d03ee5b92ecdfbe69be764f8b0f2a175aedc6bd068fd6aca078b87ea959d49b7d4b6
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Boddapati, Venkatesh
By organisation
Department of Computer Science and Engineering
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 183 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 160 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf