Multi-Microphone Systems

Self‐Configuring Multi‐Microphone Signal Enhancement

Today’s mobile speech communication takes place in a large variety of situations. Starting from the classical telephone call in handset mode, the usage of devices for mobile communication moved to the more natural way of hands‐free communication. This is also the normal use‐case for controlling devices operating in a Smart Home environment with automatic speech recognition.

The hands‐free communication is characterized by a large distance between the speaker and the microphone which is used for acquisition. This fact imposes new challenges to speech enhancement as the audio signal at the microphone is more sensitive to noise and reverberation. Conventional, single channel speech enhancement is not able to generate satisfactory results in mitigating these disturbing signals. An emerging strategy for the improvement of speech quality and word recognition rate in the hands‐free use‐case is the use of multiple microphones for the acquisition of sound. The multi‐channel audio stream allows for utilizing the spatial diversity of sound. In the simplest case the first microphone would pick up speech plus noise and the second microphone would pick up noise only. Now the signal from the noise‐only microphone can be subtracted from the microphone with speech plus noise resulting in a noise‐free, enhanced output.

More sophisticated algorithms for multi‐channel speech enhancement use beamforming techniques. They amplify signals coming from one direction while attenuating signals coming from another direction. This is usually done by filtering of the input signals and subsequent summation to generate the enhanced output. One problem with this kind of processing is that the direction of the desired source has to be known. If it is not known it has to be estimated. In either case the beamforming algorithm implicitly uses information on the microphone geometry, for example the distance between the microphones to compensate for the propagation delays between the microphones and to perform a coherent summation of the desired signal. As a consequence a beamforming algorithm always has to be adapted to the specific device it is used in. That creates additional time for development and optimization of individualized solutions. Besides the microphone geometry, which varies from one device to another but stays fixed for a specific device, the use‐case in which the device is used will probably change over time and beamforming may not be the best choice for signal enhancement anymore. Up to now the user has to tell the device in which use‐case it is used right now, for example by pressing the button for the hands‐free mode on a smartphone.

To remove both, the additional engineering time for adapting existing solutions to a specific device and the inconvenient, manual use‐case selection by the user, we are doing research on self‐configuring systems which are able to perform these tasks automatically. Furthermore the desired enhancement system should only rely on information provided by the multi‐channel audio streams themselves and not on additional information from other sensors of the device. This makes the integration of the enhancement system into a digital signal processor (DSP) easier, because the DSP‐chip does not need to provide extra signaling interfaces for the communication to other controllers.

A high level view of a future enhancement system is shown in Figure 1. The ‘use‐case analysis’‐block gets the microphone signals as an input, classifies the current use‐case and controls the signal enhancement block. Additionally a calibration signal could be sent out via the loudspeakers of the mobile device to make the estimation of the microphone array geometry easier.

Currently we are investigating features that allow a robust determination of the use‐case and an estimation of the microphone positions on the mobile device. Also we are developing so‐called Blind Beamforming algorithms, which perform beamforming independent of the actual array geometry and source location. There is always need for students who want to write their Bachelor’s or Master’s thesis in these fields.

References

[jeub10a]
Marco Jeub, Magnus Schäfer, Hauke Krüger, Christoph Matthias Nelke, Christophe Beaugeant, and Peter Vary
Do We Need Dereverberation for Hand-Held Telephony?
International Congress on Acoustics (ICA), August 2010

[loellmann09d]
Heinrich W. Löllmann and Peter Vary
Low Delay Noise Reduction and Dereverberation for Hearing Aids
EURASIP Journal on Applied Signal Processing, 2009

[jeub10]
Marco Jeub and Peter Vary
Binaural Dereverberation Based on a Dual-Channel Wiener Filter with Optimized Noise Field Coherence
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2010

[jeub10b]
Marco Jeub, Magnus Schäfer, Thomas Esch, and Peter Vary
Model-Based Dereverberation Preserving Binaural Cues
IEEE Transactions on Audio, Speech, and Language Processing, September 2010