Change search
Link to record
Permanent link

Direct link
Swartling, Mikael
Publications (9 of 9) Show all publications
Swartling, M. (2012). Direction of Arrival Estimation and Localization of Multiple Speech Sources in Enclosed Environments. (Doctoral dissertation). Karlskrona: Blekinge Institute of Technology
Open this publication in new window or tab >>Direction of Arrival Estimation and Localization of Multiple Speech Sources in Enclosed Environments
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Speech communication is gaining in popularity in many different contexts as technology evolves. With the introduction of mobile electronic devices such as cell phones and laptops, and fixed electronic devices such as video and teleconferencing systems, more people are communicating which leads to an increasing demand for new services and better speech quality. Methods to enhance speech recorded by microphones often operate blindly without prior knowledge of the signals. With the addition of multiple microphones to allow for spatial filtering, many blind speech enhancement methods have to operate blindly also in the spatial domain. When attempting to improve the quality of spoken communication it is often necessary to be able to reliably determine the location of the speakers. A dedicated source localization method on top of the speech enhancement methods can assist the speech enhancement method by providing the spatial information about the sources. This thesis addresses the problem of speech-source localization, with a focus on the problem of localization in the presence of multiple concurrent speech sources. The primary work consists of methods to estimate the direction of arrival of multiple concurrent speech sources from an array of sensors and a method to correct the ambiguities when estimating the spatial locations of multiple speech sources from multiple arrays of sensors. The thesis also improves the well-known SRP-based methods with higher-order statistics, and presents an analysis of how the SRP-PHAT performs when the sensor array geometry is not fully calibrated. The thesis is concluded by two envelope-domain-based methods for tonal pattern detection and tonal disturbance detection and cancelation which can be useful to further increase the usability of the proposed localization methods. The main contribution of the thesis is a complete methodology to spatially locate multiple speech sources in enclosed environments. New methods and improvements to the combined solution are presented for the direction-of-arrival estimation, the location estimation and the location ambiguity correction, as well as a sensor array calibration sensitivity analysis.

Place, publisher, year, edition, pages
Karlskrona: Blekinge Institute of Technology, 2012
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090 ; 3
Keywords
Beamforming, Detection and classification, Speech enhancement, Source localization
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-00520 (URN)oai:bth.se:forskinfoACD1267A1007C477C125796400452ABB (Local ID)978-91-7295-226-3 (ISBN)oai:bth.se:forskinfoACD1267A1007C477C125796400452ABB (Archive number)oai:bth.se:forskinfoACD1267A1007C477C125796400452ABB (OAI)
External cooperation:
Available from: 2012-09-18 Created: 2011-12-12 Last updated: 2025-09-30Bibliographically approved
Swartling, M., Ström Bartunek, J., Nilsson, K., Gustavsson, I. & Fiedler, M. (2012). Simulations of the VISIR Open Lab Platform. Paper presented at Remote Engineering and Virtual Instrumentation. Paper presented at Remote Engineering and Virtual Instrumentation. Bilbao, Spain: IEEE
Open this publication in new window or tab >>Simulations of the VISIR Open Lab Platform
Show others...
2012 (English)Conference paper, Published paper (Refereed) Published
Abstract [en]

This paper presents a queue simulation of the VISIR Open Lab Platform. A model of the platform and statistical distributions of how users interact with the system based on real log files are presented. The system is then simulated in order to determine how many concurrent students that can be allowed to use the platform while at the same time keeping a low response time to ensure the quality of the service. The results show, in a worst case setup with approximately 300 ms response time per experiment, that roughly 100 concurrent users is an upper limit to ensure an average response time below 2 s. The results also show that raising the limit of the desired experiment response time does not necessarily increase the number allowed concurrent users significantly once the system is saturated. However, improving the experiment response time can significantly increase the number of users that can simultaneously be connected.

Place, publisher, year, edition, pages
Bilbao, Spain: IEEE, 2012
Keywords
VISIR, Simulation, Quality of Experience
National Category
Signal Processing Computer Sciences
Identifiers
urn:nbn:se:bth-7238 (URN)10.1109/REV.2012.6293108 (DOI)oai:bth.se:forskinfo7F78CA4AE2BF1C3CC1257A8500018974 (Local ID)978-1-4673-2541-7 (ISBN)oai:bth.se:forskinfo7F78CA4AE2BF1C3CC1257A8500018974 (Archive number)oai:bth.se:forskinfo7F78CA4AE2BF1C3CC1257A8500018974 (OAI)
Conference
Remote Engineering and Virtual Instrumentation
Available from: 2012-09-27 Created: 2012-09-26 Last updated: 2025-09-30Bibliographically approved
Swartling, M. & Grbic, N. (2011). Calibration errors of uniform linear sensor arrays for DOA estimation: an analysis with SRP-PHAT. Signal Processing, 91(4), 1071-1075
Open this publication in new window or tab >>Calibration errors of uniform linear sensor arrays for DOA estimation: an analysis with SRP-PHAT
2011 (English)In: Signal Processing, ISSN 0165-1684, E-ISSN 1872-7557, Vol. 91, no 4, p. 1071-1075Article in journal (Refereed) Published
Abstract [en]

This article presents an analysis of the sensitivity of geometrical sensor errors in acoustic source localization using the well-established SRP-PHAT method. The array in this analysis is a uniform linear array and the intended source is human speech in the far field. Two major results are presented: inner-sensor geometrical errors in the linear array produce smaller localization errors than corresponding geometrical errors do in the two end-point sensors, and the localization error rises sharply for a total geometrical error exceeding the equivalence of the acoustic propagation distance of 2/3 of the sample time instance (approximately 3 cm at 8 kHz). The article also provides a mathematical and graphical explanation of the results.

Place, publisher, year, edition, pages
Elsevier, 2011
Keywords
Direction of arrival estimation, Calibration
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-7557 (URN)10.1016/j.sigpro.2010.09.018 (DOI)000286864700040 ()oai:bth.se:forskinfo60271DA38B47B83AC125789A0030CCD4 (Local ID)oai:bth.se:forskinfo60271DA38B47B83AC125789A0030CCD4 (Archive number)oai:bth.se:forskinfo60271DA38B47B83AC125789A0030CCD4 (OAI)
Available from: 2012-09-18 Created: 2011-05-24 Last updated: 2025-09-30Bibliographically approved
Swartling, M., Sällberg, B. & Grbic, N. (2011). Source localization for multiple speech sources using low complexity non-parametric source separation and clustering. Signal Processing, 91(8), 1781-1788
Open this publication in new window or tab >>Source localization for multiple speech sources using low complexity non-parametric source separation and clustering
2011 (English)In: Signal Processing, ISSN 0165-1684, E-ISSN 1872-7557, Vol. 91, no 8, p. 1781-1788Article in journal (Refereed) Published
Abstract [en]

This article presents a new method for localization of multiple concurrent speech sources that relies on simultaneous blind signal separation and direction of arrival (DOA) estimation, as well as a method to solve the intersection point selection problem that arises when locating multiple speech sources using multiple sensor arrays. The proposed method is based on a low complexity non-parametric blind signal separation method, making is suitable for real-time applications on embedded platforms. On top of reduced complexity in comparison to a previously presented method, the DOA estimation accuracy is also improved. Evaluation of the performance is done with both real recording and simulations, and a real-time prototype of the proposed method has been implemented on a DSP platform to evaluate the computational and the memory complexities in a real application.

Place, publisher, year, edition, pages
Elsevier, 2011
Keywords
Source localization, Direction of arrival estimation, Acoustic arrays, Speech processing
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-7556 (URN)10.1016/j.sigpro.2011.02.002 (DOI)000291291700009 ()oai:bth.se:forskinfo52B7292F57A3699EC125789A00312160 (Local ID)oai:bth.se:forskinfo52B7292F57A3699EC125789A00312160 (Archive number)oai:bth.se:forskinfo52B7292F57A3699EC125789A00312160 (OAI)
Available from: 2012-09-18 Created: 2011-05-24 Last updated: 2025-09-30Bibliographically approved
Swartling, M., Sällberg, B. & Grbic, N. (2008). Direction of Arrival Estimation for Speech Sources using Fourth Order Cross Cumulants. Paper presented at International Symposium on Circuits and Systems. Paper presented at International Symposium on Circuits and Systems. Seattle: IEEE
Open this publication in new window or tab >>Direction of Arrival Estimation for Speech Sources using Fourth Order Cross Cumulants
2008 (English)Conference paper, Published paper (Refereed) Published
Abstract [en]

In many applications where speech separation and enhancement is of interest, e.g. conferencing systems, mobile phones and hearing aids, accurate speaker localization is important. This paper presents an alternative criteria for the well known Steered Response Power with Phase Transform (SRP-PHAT) algorithm, in which the steered response relates to peaks in the fourth order cross cumulant, rather than peaks in the second order cross cumulant, i.e. the cross power spectrum. Since speech sources have a Probability Density Function (PDF) close to the Laplacian distribution and noise are generally closer to the Gaussian distribution, the fourth order cumulant becomes a good alternative for the steered response search for speech sources. The proposed method is evaluated and compared to the original SRP-PHAT algorithm and shows significant improvements in localization performance for speech sources.

Place, publisher, year, edition, pages
Seattle: IEEE, 2008
Keywords
Localization, Delay estimation, Higher order statistics
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-8499 (URN)000258532101155 ()oai:bth.se:forskinfo4F23DB790184F372C12574A40059C144 (Local ID)oai:bth.se:forskinfo4F23DB790184F372C12574A40059C144 (Archive number)oai:bth.se:forskinfo4F23DB790184F372C12574A40059C144 (OAI)
Conference
International Symposium on Circuits and Systems
Available from: 2012-09-18 Created: 2008-08-13 Last updated: 2025-09-30Bibliographically approved
Swartling, M., Nilsson, M. & Grbic, N. (2008). Distinguishing True and False Source Locations when Localizing Multiple Concurrent Speech Sources. Paper presented at IEEE Sensor Array and Multichannel Signal Processing Workshop. Paper presented at IEEE Sensor Array and Multichannel Signal Processing Workshop. Darmstadt, GERMANY: IEEE
Open this publication in new window or tab >>Distinguishing True and False Source Locations when Localizing Multiple Concurrent Speech Sources
2008 (English)Conference paper, Published paper (Refereed) Published
Abstract [en]

A permutation problem arises in the case of locating multiple speech sources using several sensor arrays in the far field. The intersection of different direction of arrival (DOA) estimates between sensor arrays leads to a set of real source locations as well as a set of false intersections. This paper presents a novel method for pairing DOA estimates from different sensor arrays, resulting in the corresponding real intersection points. The algorithm presented is numerically efficient and suitable for real time implementations. Real room recordings are used to evaluate the method.

Place, publisher, year, edition, pages
Darmstadt, GERMANY: IEEE, 2008
Keywords
Array signal processing, Position measurement
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-8498 (URN)000260566500080 ()oai:bth.se:forskinfoEA817F5196A3600CC12574A4005A056C (Local ID)978-1-4244-2240-1 (ISBN)oai:bth.se:forskinfoEA817F5196A3600CC12574A4005A056C (Archive number)oai:bth.se:forskinfoEA817F5196A3600CC12574A4005A056C (OAI)
Conference
IEEE Sensor Array and Multichannel Signal Processing Workshop
Available from: 2012-09-18 Created: 2008-08-13 Last updated: 2025-09-30Bibliographically approved
Swartling, M., Nilsson, M. & Grbic, N. (2007). Detection of Vehicle Mounted Auditory Reverse Alarm using Hidden Markov Model. Paper presented at ELMAR. Paper presented at ELMAR. Zadar: IEEE
Open this publication in new window or tab >>Detection of Vehicle Mounted Auditory Reverse Alarm using Hidden Markov Model
2007 (English)Conference paper, Published paper (Refereed) Published
Abstract [en]

This paper presents a method for automatically detecting vehicle mounted auditory reverse alarms, or other similar warning signals, based on hidden Markov model and pattern matching techniques. The method is designed for embedded realtime platforms. The purpose of the method is to embed it with active hearing protection devices, aiding the user in detecting warning signals in low SNR environments. Real recordings are used to evaluate the performance, and the results are presented.

Place, publisher, year, edition, pages
Zadar: IEEE, 2007
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-9139 (URN)000256667800035 ()oai:bth.se:forskinfoEAD2A0E2F7E5AC25C125731E003E0B8E (Local ID)oai:bth.se:forskinfoEAD2A0E2F7E5AC25C125731E003E0B8E (Archive number)oai:bth.se:forskinfoEAD2A0E2F7E5AC25C125731E003E0B8E (OAI)
Conference
ELMAR
Available from: 2012-09-18 Created: 2007-07-20 Last updated: 2025-09-30Bibliographically approved
Swartling, M., Grbic, N. & Claesson, I. (2006). Direction of Arrival Estimation for Multiple Speakers using Time-Frequency Orthogonal Signal Separation. Paper presented at ICASSP 2006. Paper presented at ICASSP 2006. Toulouse: IEEE
Open this publication in new window or tab >>Direction of Arrival Estimation for Multiple Speakers using Time-Frequency Orthogonal Signal Separation
2006 (English)Conference paper, Published paper (Refereed) Published
Abstract [en]

This paper presents a new approach for multiple speaker DOA estimation using an array of microphones. The method relies on the fact that multiple independent speakers have a small overlap in the time-frequency domain, i.e. the individual signals are almost W-disjoint orthogonal. By introducing a time-frequency mask and by continuously track the set of time-frequency points corresponding to each individual speech signal, a single source DOA estimation algorithm is used to find the DOA for each separated signal. This approach does not limit the solution to cases where the number of sensors exceeds the number of sources. Real room recordings are used to evaluate the performance of the method where source movements are also included.

Place, publisher, year, edition, pages
Toulouse: IEEE, 2006
Keywords
blind source separation, direction-of-arrival estimation, matrix algebra, microphone arrays, speech processing, time-frequency analysis
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-10020 (URN)000245559905065 ()oai:bth.se:forskinfo56631AAFBB5B21B9C12571A800363C1D (Local ID)1-4244-0469-X (ISBN)oai:bth.se:forskinfo56631AAFBB5B21B9C12571A800363C1D (Archive number)oai:bth.se:forskinfo56631AAFBB5B21B9C12571A800363C1D (OAI)
Conference
ICASSP 2006
Note
Copyright © 19xx/20xx IEEE. Reprinted from (all relevant publication info). This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of BTH's products or services Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by sending a blank email message to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.Available from: 2012-09-18 Created: 2006-07-11 Last updated: 2025-09-30Bibliographically approved
Sällberg, B., Swartling, M., Grbic, N. & Claesson, I. (2006). REAL TIME IMPLEMENTATION OF A BLIND BEAMFORMER FOR SUBBAND SPEECH ENHANCEMENT USING KURTOSIS MAXIMIZATION. Paper presented at International Workshop on Acoustic Echo and Noise Control. Paper presented at International Workshop on Acoustic Echo and Noise Control. Paris
Open this publication in new window or tab >>REAL TIME IMPLEMENTATION OF A BLIND BEAMFORMER FOR SUBBAND SPEECH ENHANCEMENT USING KURTOSIS MAXIMIZATION
2006 (English)Conference paper, Published paper (Refereed) Published
Abstract [en]

This paper presents a real time implementation of a blind beamformer for subband speech enhancement. The beamformer adaptively maximizes the statistical kurtosis measure of the beamformer’s output signal. Speech carries high kurtosis and noise often exhibit lower kurtosis. Hence, maximization of the output signal’s kurtosis enhances speech, in general. The implementation is carried out on a novel framework for real time audio processing in MATLAB and uses low latency ASIO sound cards. The implementation is evaluated using recorded signals and the speech is enhanced approximately 10 dB by the proposed approach with perceptually low speech distortion.

Place, publisher, year, edition, pages
Paris: , 2006
National Category
Signal Processing
Identifiers
urn:nbn:se:bth-10122 (URN)oai:bth.se:forskinfo3C8FA5BD2FBA266DC12571DC0068D8DA (Local ID)oai:bth.se:forskinfo3C8FA5BD2FBA266DC12571DC0068D8DA (Archive number)oai:bth.se:forskinfo3C8FA5BD2FBA266DC12571DC0068D8DA (OAI)
Conference
International Workshop on Acoustic Echo and Noise Control
Available from: 2012-09-18 Created: 2006-09-01 Last updated: 2025-09-30Bibliographically approved
Organisations

Search in DiVA

Show all publications