Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Classifying Environmental Sounds with Image Networks
Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, Institutionen för datalogi och datorsystemteknik.
2017 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
Abstract [en]

Context. Environmental Sound Recognition, unlike Speech Recognition, is an area that is still in the developing stages with respect to using Deep Learning methods. Sound can be converted into images by extracting spectrograms and the like. Object Recognition from images using deep Convolutional Neural Networks is a currently developing area holding high promise. The same technique has been studied and applied, but on image representations of sound.

Objectives. In this study, investigation is done to determine the best possible accuracy of performing a sound classification task using existing deep Convolutional Neural Networks by comparing the data pre-processing parameters. Also, a novel method of combining different features into a single image is proposed and its effect tested. Lastly, the performance of an existing network that fuses Convolutional and Recurrent Neural architectures is tested on the selected datasets.

Methods. In this, experiments were conducted to analyze the effects of data pre-processing parameters on the best possible accuracy with two CNNs. Also, experiment was also conducted to determine whether the proposed method of feature combination is beneficial or not. Finally, an experiment to test the performance of a combined network was conducted.

Results. GoogLeNet had the highest classification accuracy of 73% on 50-class dataset and 90-93% on 10-class datasets. The sampling rate and frame length values of the respective datasets which contributed to the high scores are 16kHz, 40ms and 8kHz, 50ms respectively. The proposed combination of features does not improve the classification accuracy. The fused CRNN network could not achieve high accuracy on the selected datasets.

Conclusions. It is concluded that deep networks designed for object recognition can be successfully used to classify environmental sounds and the pre-processing parameters’ values determined for achieving best accuracy. The novel method of feature combination does not significantly improve the accuracy when compared to spectrograms alone. The fused network which learns the special and temporal features from spectral images performs poorly in the classification task when compared to the convolutional network alone.

sted, utgiver, år, opplag, sider
2017.
Emneord [en]
Machine Learning, Environmental Sound Classification, Image Classification.
HSV kategori
Identifikatorer
URN: urn:nbn:se:bth-14062OAI: oai:DiVA.org:bth-14062DiVA, id: diva2:1086382
Eksternt samarbeid
Sony Mobile Communications, Lund
Fag / kurs
DV2566 Master's Thesis (120 credits) in Computer Science
Utdanningsprogram
DVAXA Master of Science Programme in Computer Science
Presentation
2017-01-23, 09:00 (engelsk)
Veileder
Examiner
Tilgjengelig fra: 2017-04-03 Laget: 2017-04-02 Sist oppdatert: 2018-01-13bibliografisk kontrollert

Open Access i DiVA

fulltext(764 kB)578 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 764 kBChecksum SHA-512
feb0b2af534b2944e58cf11320a2506f76dda5de285aafc24117c316de43d03ee5b92ecdfbe69be764f8b0f2a175aedc6bd068fd6aca078b87ea959d49b7d4b6
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Boddapati, Venkatesh
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 578 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 843 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf