When using mobile telephony, one dialog partner is often situated in a noisy environment. In this example, this is the person on the near-end. This leads to two major problems:
- The noise is recorded by the near-end microphone along with the speech and transmitted to the far-end listener. Therefore, the intelligibility for the far-end listener is affected. Algorithms for noise reduction have been proposed to suppress the noise while preserving the speech signal.
- The near-end listener also experiences an increased listening effort and a reduced speech intelligibility since he perceives a mixture of the clean speech from the far-end and the acoustical background noise as illustrated in the figure. This problem is addressed in the following.
Solution: Near-End Listening Enhancement
Typically, it is not possible to influence the background noise at the near-end. The only possibility to enhance the speech perception is to adaptively preprocess the speech signal from the far-end before playing it back at the near-end. This approach is called near-end listening enhancement (NELE). Algorithms for NELE take the background noise into account and adapt the speech based on the noise characteristics such that it becomes more intelligible.
The simplest possible NELE algorithm would be an adaptive gain control that increases the speech volume if the noise is very loud. However, very high speech levels are annoying and might damage either the loudspeaker or the human auditory system. Therefore, more intelligent algorithms have been developed. In [niermann20], for example, the speech power is spectrally redistributed using two strategies in order to reduce the auditory masking effect. In this way, the available speech power contributes to the intelligibility more effectively. In [sauert10a, sauert14], spectral weights are calculated by maximizing an intelligibility measure, the Speech Intelligibility Index (SII). Further approaches can be found in the literature.
The methods mentioned above are capable of improving the speech intelligibility [niermann 19, niermann20] and decreasing the listening effort [niermann15] without increasing the total speech power. This is done at the expense of speech naturalness since the preprocessor modifies the characteristics of the speech. Nevertheless, in severe noise conditions a voice which is modified up to a certain degree is often preferred in comparison to a natural voice which is not understandable.
Besides mobile telephony, further possible applications are
- public address systems, e.g., railway stations [niermann16] and airplaines,
- hearing aids
- headsets and
- car multimedia systems.
Depending on the application, further challenges arise. Some of them are covered by [niermann19]. This includes:
- the interaction of far-end noise reduction and near-end listening enhancement when the far-end as well as the near-end are disturbed by noise and
- the estimation of noise in the presence of strong echoes (see also public address systems)
Bastian Sauert and Peter Vary
Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2006
Bastian Sauert and Peter Vary
Recursive Closed-Form Optimization of Spectral Audio Power Allocation for Near End Listening Enhancement
ITG-Fachtagung Sprachkommunikation, October 2010
Near-End Listening Enhancement: Theory and Application
Ph.D. thesis, May 2014
Markus Niermann, Florian Heese, and Peter Vary
Intelligibility Enhancement For Hands-Free Mobile Communication
Proceedings of German Annual Conference on Acoustics (DAGA), 2015
Markus Niermann, Peter Jax, and Peter Vary
Noise Estimation For Speech Reinforcement in the Presence of Strong Echoes
Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
Digital Enhancement of Speech Perception in Noisy Environments
Ph.D. thesis, 2019
Markus Niermann, Peter Vary
Listening Enhancement in Noisy Environments: Solutions in Time and Frequency Domain
IEEE Transactions on Audio, Speech and Language Processing, December 2020