Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Classifying environmental sounds using image recognition networks
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
Sony Mobile Communications AB, SWE.
Sony Mobile Communications AB, SWE.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
2017 (English)In: Procedia Computer Science / [ed] Toro C.,Hicks Y.,Howlett R.J.,Zanni-Merk C.,Toro C.,Frydman C.,Jain L.C.,Jain L.C., Elsevier B.V. , 2017, Vol. 112, p. 2048-2056Conference paper, Published paper (Refereed)
Abstract [en]

Automatic classification of environmental sounds, such as dog barking and glass breaking, is becoming increasingly interesting, especially for mobile devices. Most mobile devices contain both cameras and microphones, and companies that develop mobile devices would like to provide functionality for classifying both videos/images and sounds. In order to reduce the development costs one would like to use the same technology for both of these classification tasks. One way of achieving this is to represent environmental sounds as images, and use an image classification neural network when classifying images as well as sounds. In this paper we consider the classification accuracy for different image representations (Spectrogram, MFCC, and CRP) of environmental sounds. We evaluate the accuracy for environmental sounds in three publicly available datasets, using two well-known convolutional deep neural networks for image recognition (AlexNet and GoogLeNet). Our experiments show that we obtain good classification accuracy for the three datasets. © 2017 The Author(s).

Place, publisher, year, edition, pages
Elsevier B.V. , 2017. Vol. 112, p. 2048-2056
Keywords [en]
Convolutional Neural Networks, Deep Learning, Environmental Sound Classification, GPU Processing, Image Classification, Classification (of information), Convolution, Deep neural networks, Image recognition, Knowledge based systems, Neural networks, Automatic classification, Classification accuracy, Classification tasks, Convolutional neural network, Environmental sound classifications, Environmental sounds, Image representations
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:bth-15478DOI: 10.1016/j.procs.2017.08.250ISI: 000418466000216Scopus ID: 2-s2.0-85032359938OAI: oai:DiVA.org:bth-15478DiVA, id: diva2:1156090
Conference
21st International Conference on Knowledge - Based and Intelligent Information and Engineering Systems, (KES), Marseille
Part of project
Bigdata@BTH- Scalable resource-efficient systems for big data analytics, Knowledge FoundationAvailable from: 2017-11-10 Created: 2017-11-10 Last updated: 2021-07-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Lundberg, Lars

Search in DiVA

By author/editor
Boddapati, VenkateshLundberg, Lars
By organisation
Department of Computer Science and Engineering
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 456 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf