Intelligibility Enhancement

Problem Formulation

When using mobile telephony, one dialog partner is often situated in a noisy environment. In this example, this is the person on the near-end. This leads to two major problems:

  1. The noise is recorded by the near-end microphone along with the speech and transmitted to the far-end listener. Therefore, the intelligibility for the far-end listener is affected. Algorithms for noise reduction have been proposed to suppress the noise while preserving the speech signal.
  2. The near-end listener also experiences an increased listening effort and a reduced speech intelligibility since he perceives a mixture of the clean speech from the far-end and the acoustical background noise as illustrated in the figure. This problem is addressed in the following.

Solution: Near-End Listening Enhancement

Typically, it is not possible to influence the background noise at the near-end. The only possibility to enhance the speech perception is to adaptively preprocess the speech signal from the far-end before playing it back at the near-end. This approach is called near-end listening enhancement (NELE). Algorithms for NELE take the background noise into account and adapt the speech based on the noise characteristics such that it becomes more intelligible.

The simplest possible NELE algorithm would be an adaptive gain control that increases the speech volume if the noise is very loud. However, very high speech levels are annoying and might damage either the loudspeaker or the human auditory system. Therefore, more intelligent algorithms have been developed. In [Niermann EUSIPCO 2016], for example, the speech power is shifted from frequency ranges which are highly disturbed by the near-end noise to frequency ranges which are less disturbed. In this way, the available speech power contributes to the intelligibility more effectively. In [Sauert ITG 2010, Sauert Dissertation 2014], spectral weights are calculated by maximizing an intelligibility measure, the Speech Intelligibility Index (SII). Further approaches can be found in the literature.

The methods mentioned above are capable of improving the speech intelligibility and decreasing the listening effort without increasing the total speech power (shown in [Niermann Daga 2015]). This is done at the expense of speech naturalness since the preprocessor modifies the characteristics of the speech. Nevertheless, in severe noise conditions a voice which is modified up to a certain degree is often preferred in comparison to a natural voice which is not understandable.

Besides mobile telephony, further possible applications are


Bastian Sauert and Peter Vary
Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2006

Bastian Sauert and Peter Vary
Recursive Closed-Form Optimization of Spectral Audio Power Allocation for Near End Listening Enhancement
ITG-Fachtagung Sprachkommunikation, October 2010

Bastian Sauert
Near-End Listening Enhancement: Theory and Application
Dissertation, May 2014

Markus Niermann, Florian Heese, and Peter Vary
Intelligibility Enhancement For Hands-Free Mobile Communication
Proceedings of German Annual Conference on Acoustics (DAGA), 2015

Markus Niermann, Peter Jax, and Peter Vary
Noise Estimation For Speech Reinforcement in the Presence of Strong Echoes
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016

Markus Niermann, Peter Jax, and Peter Vary
Near-End Listening Enhancement by Noise-Inverse Speech Shaping
Proceedings of European Signal Processing Conference (EUSIPCO), August 2016

Markus Niermann, Christian Thierfeld, Peter Jax, and Peter Vary
Time Domain Approach for Listening Enhancement in Noisy Environments
ITG-Fachtagung Sprachkommunikation, October 2016