Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Source Localization and Speech Enhancement for Speech Recognition for Real time Environment
Blekinge Institute of Technology, School of Engineering.
Blekinge Institute of Technology, School of Engineering.
2012 (English)Independent thesis Advanced level (degree of Master (Two Years))Student thesis
Abstract [en]

Popularity of speech communication is rapidly increasing in various contexts such as conferencing systems, mobile/fixed electronic devices and laptops thus leading to a heightened demand for new services and improved speech quality. Dictaphones used for dictations usually have one microphone. Single microphone does not give enough degree of freedom to allow estimation of location of the source. Microphone array makes use of multiple microphones for spatial filtering suppressing the background noise. This report aims for speech enhancement utilizing the benefits inherited with microphone arrays to find direction of desired speaker and focus the listening beam in that direction. A comparison is made between Generalized Cross Correlation (GCC) methods for locating the source in real office environment. Beamforming is implemented to make the microphone array listen in the desired direction thus reducing the interference from other sources. Minimum Variance Distortion-less Response (MVDR) approach is shown to give better results compared to more simplistic techniques. Perceptual based Eigen filter incorporating human hearing models in subspace incorporated in the suppressor eliminates the residual noise. Objective system performance is evaluated by estimating Signal-to-Noise-Ratio improvement (SNRI), segmental SNR, signal degradation and noise suppression. Perpetual Evaluation of Speech Quality (PESQ) gives Mean Opinion Score for subjective evaluation.

Place, publisher, year, edition, pages
2012. , p. 58
Keywords [en]
Beamforming, Localization, Lapped Transform, SRP-PHAT, MVDR, Subspace Supression, PESQ
National Category
Computer Sciences Signal Processing
Identifiers
URN: urn:nbn:se:bth-4130Local ID: oai:bth.se:arkivex132DFBA86A235B7EC1257AEE0054D98FOAI: oai:DiVA.org:bth-4130DiVA, id: diva2:831453
Uppsok
Technology
Supervisors
Note
asim_zolo@yahoo.com, akbarali45@gmail.comAvailable from: 2015-04-22 Created: 2013-01-09 Last updated: 2018-01-11Bibliographically approved

Open Access in DiVA

fulltext(1218 kB)1046 downloads
File information
File name FULLTEXT01.pdfFile size 1218 kBChecksum SHA-512
9916415e9d2d83369d585421320cfc7721d492f6663bc0b50b35d378c86266df6452c2389d89b24f7509ab6778e71732d2d4112a01745ecc80a65cfd5fd928a6
Type fulltextMimetype application/pdf

By organisation
School of Engineering
Computer SciencesSignal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 1046 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 415 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf