Change search
Refine search result
1 - 2 of 2
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Boddapati, Venkatesh
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Classifying Environmental Sounds with Image Networks2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Context. Environmental Sound Recognition, unlike Speech Recognition, is an area that is still in the developing stages with respect to using Deep Learning methods. Sound can be converted into images by extracting spectrograms and the like. Object Recognition from images using deep Convolutional Neural Networks is a currently developing area holding high promise. The same technique has been studied and applied, but on image representations of sound.

    Objectives. In this study, investigation is done to determine the best possible accuracy of performing a sound classification task using existing deep Convolutional Neural Networks by comparing the data pre-processing parameters. Also, a novel method of combining different features into a single image is proposed and its effect tested. Lastly, the performance of an existing network that fuses Convolutional and Recurrent Neural architectures is tested on the selected datasets.

    Methods. In this, experiments were conducted to analyze the effects of data pre-processing parameters on the best possible accuracy with two CNNs. Also, experiment was also conducted to determine whether the proposed method of feature combination is beneficial or not. Finally, an experiment to test the performance of a combined network was conducted.

    Results. GoogLeNet had the highest classification accuracy of 73% on 50-class dataset and 90-93% on 10-class datasets. The sampling rate and frame length values of the respective datasets which contributed to the high scores are 16kHz, 40ms and 8kHz, 50ms respectively. The proposed combination of features does not improve the classification accuracy. The fused CRNN network could not achieve high accuracy on the selected datasets.

    Conclusions. It is concluded that deep networks designed for object recognition can be successfully used to classify environmental sounds and the pre-processing parameters’ values determined for achieving best accuracy. The novel method of feature combination does not significantly improve the accuracy when compared to spectrograms alone. The fused network which learns the special and temporal features from spectral images performs poorly in the classification task when compared to the convolutional network alone.

  • 2.
    Boddapati, Venkatesh
    et al.
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Petef, Andrej
    Sony Mobile Communications AB, SWE.
    Rasmusson, Jim
    Sony Mobile Communications AB, SWE.
    Lundberg, Lars
    Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science and Engineering.
    Classifying environmental sounds using image recognition networks2017In: Procedia Computer Science / [ed] Toro C.,Hicks Y.,Howlett R.J.,Zanni-Merk C.,Toro C.,Frydman C.,Jain L.C.,Jain L.C., Elsevier B.V. , 2017, Vol. 112, p. 2048-2056Conference paper (Refereed)
    Abstract [en]

    Automatic classification of environmental sounds, such as dog barking and glass breaking, is becoming increasingly interesting, especially for mobile devices. Most mobile devices contain both cameras and microphones, and companies that develop mobile devices would like to provide functionality for classifying both videos/images and sounds. In order to reduce the development costs one would like to use the same technology for both of these classification tasks. One way of achieving this is to represent environmental sounds as images, and use an image classification neural network when classifying images as well as sounds. In this paper we consider the classification accuracy for different image representations (Spectrogram, MFCC, and CRP) of environmental sounds. We evaluate the accuracy for environmental sounds in three publicly available datasets, using two well-known convolutional deep neural networks for image recognition (AlexNet and GoogLeNet). Our experiments show that we obtain good classification accuracy for the three datasets. © 2017 The Author(s).

1 - 2 of 2
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf