Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Classifying Environmental Sounds with Image Networks
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.
2017 (Engelska)Självständigt arbete på avancerad nivå (masterexamen), 20 poäng / 30 hpStudentuppsats (Examensarbete)
Abstract [en]

Context. Environmental Sound Recognition, unlike Speech Recognition, is an area that is still in the developing stages with respect to using Deep Learning methods. Sound can be converted into images by extracting spectrograms and the like. Object Recognition from images using deep Convolutional Neural Networks is a currently developing area holding high promise. The same technique has been studied and applied, but on image representations of sound.

Objectives. In this study, investigation is done to determine the best possible accuracy of performing a sound classification task using existing deep Convolutional Neural Networks by comparing the data pre-processing parameters. Also, a novel method of combining different features into a single image is proposed and its effect tested. Lastly, the performance of an existing network that fuses Convolutional and Recurrent Neural architectures is tested on the selected datasets.

Methods. In this, experiments were conducted to analyze the effects of data pre-processing parameters on the best possible accuracy with two CNNs. Also, experiment was also conducted to determine whether the proposed method of feature combination is beneficial or not. Finally, an experiment to test the performance of a combined network was conducted.

Results. GoogLeNet had the highest classification accuracy of 73% on 50-class dataset and 90-93% on 10-class datasets. The sampling rate and frame length values of the respective datasets which contributed to the high scores are 16kHz, 40ms and 8kHz, 50ms respectively. The proposed combination of features does not improve the classification accuracy. The fused CRNN network could not achieve high accuracy on the selected datasets.

Conclusions. It is concluded that deep networks designed for object recognition can be successfully used to classify environmental sounds and the pre-processing parameters’ values determined for achieving best accuracy. The novel method of feature combination does not significantly improve the accuracy when compared to spectrograms alone. The fused network which learns the special and temporal features from spectral images performs poorly in the classification task when compared to the convolutional network alone.

Ort, förlag, år, upplaga, sidor
2017.
Nyckelord [en]
Machine Learning, Environmental Sound Classification, Image Classification.
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:bth-14062OAI: oai:DiVA.org:bth-14062DiVA, id: diva2:1086382
Externt samarbete
Sony Mobile Communications, Lund
Ämne / kurs
DV2566 Masterarbete i datavetenskap
Utbildningsprogram
DVAXA Masterprogram i Datavetenskap
Presentation
2017-01-23, 09:00 (Engelska)
Handledare
Examinatorer
Tillgänglig från: 2017-04-03 Skapad: 2017-04-02 Senast uppdaterad: 2018-01-13Bibliografiskt granskad

Open Access i DiVA

fulltext(764 kB)578 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 764 kBChecksumma SHA-512
feb0b2af534b2944e58cf11320a2506f76dda5de285aafc24117c316de43d03ee5b92ecdfbe69be764f8b0f2a175aedc6bd068fd6aca078b87ea959d49b7d4b6
Typ fulltextMimetyp application/pdf

Sök vidare i DiVA

Av författaren/redaktören
Boddapati, Venkatesh
Av organisationen
Institutionen för datalogi och datorsystemteknik
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 578 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 843 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf