In recent years, the development of new communication systems has been directed more and more by the customer desired for more comfort. An important factor for commercial success is the high quality of the hands-free interface.
Due to the lack of an acoustical barrier between loudspeaker and microphone, the signal of the communicating party at the far end is picked up by the microphone and transmitted back to its origin. The far speaker perceives this as an echo. The subjective disturbance of this echo increases with a longer transmission delay, e.g. in mobile radio systems. Thus, a cancellation of the acoustical echo is necessary.
An adaptive filter is used to identify and reconstruct the acoustic echo path and thus the room impulse response. The result of the filtering is an estimated echo signal, which can be subtracted from the microphone signal to reduce the disturbing echo. The adaptation is performed mainly in far-end single talk (FEST) phases where the near-end speaker does not talk. In phases of near-end single talk (NEST) or double talk (DT), the adaptation freezes and the current estimates is used until the next phase of FEST.
A statistical postfilter is used to reduce non-deterministic influences of the acoustic echo path. These influences are introduced by the time-variant room impulse response, as well as a limited adaptive filter length, which in real systems is smaller than the reverberation time.
Traditional algorithms for the adaption of these components, e.g., the LMS or RLS algorithm, usually depend on a combination of sophisticated control mechanism to ensure the adaption and the robustness in real time-variable and disturbed circumstances. These mechanism generally use a large number of different control parameters, which need a lot of experience to adjust to the respective situation. Another model-based approach leads to a Kalman-filter based adaption with an inherent step-size control.
By using a model-based approach for modelling the acoustical environment [Enzner06, Enzner14] it is possible to employ the Kalman algorithm in the context of echo cancellation. In contrast to the traditional approach, the acoustical echo path is considered as a time-variable vectorial state variable. A stationary Markov model is assumed for this variable. The near-end speech input represents an observation noise for the echo path.
The task of an acoustic echo controller is thus to extract the observation noise from the microphone signal in an optimal sense. The Kalman algorithm provides a possibility to estimate the parameters of a generalized Wiener filter. The two-stage filter structure consists of an acoustic echo canceller between transmit and receive paths and a postfilter for residual echo suppression. In contrast to previous solutions, this filter structure can be derived directly by following the MMSE criterion. The Kalman filter basically requires only three statistical parameters of the acoustic environment:
- the length of the echo path vector
- the degree of time-variability of the acoustic echo path, i.e., the time-constant of the vector Markov model of the echo path
- the Power Spectral Density (PSD) of the observation noise at the hands-free microphone
A generalization of this theory in order to achieve combined acoustic echo and noise reduction, is straightforward. The Wiener postfilter needs to be extended to consider the PSDs of existing additive noise.
Auto-Decorrelation with Linear Prediction
One problem in the identification process is the autocorrelation of the far-end signal, which often crucially affects the adaptation. Even sophisticated solutions like the state-space approach with the Kalman algorithm suffer to a certain extent from correlated input signals. Therefore, similar to an idea for the time domain NLMS algorithm [Antweiler95], linear prediction filters in the adaptation paths can be used to decorrelate the inputs for the adaptation algorithm without altering the physically transmitted acoustic signals [Kühl17a]. Due to the time-variant characteristics of the involved filters, an error is introduced by this process, which can be compensated for by an additional refiltering stage.
As a result of increasing computational resources as well as the possibility to use more than only one loudspeaker and microphone even in small devices like smartphones, multichannel acoustic echo cancellers receive more and more interest. For each microphone a separate echo canceller is needed as every echo signal at every microphone has to be cancelled. Regarding only one microphone, the total echo consists now of the different echo signals from each loudspeaker. To cancel these echoes, the impulse responses from each loudspeaker to the microphone have to be estimated from the same microphone signal. If the loudspeaker signals are cross-correlated, the estimation problem becomes ill-conditioned or even ambiguous, leading to the so called non-uniqueness problem.
To circumvent this problem, the far-end input signals are often decorrelated, e.g., with an half-wave –rectifier, adding non-linearities to the signal and therefore leading to more decorrelated input signals. Also the autocorrelation of each input has a negative impact on the identification performance. A possibility considering both, auto- and cross-correlation is the combinatino of half-wave –rectifer and the usage of linear predictor filters in the adaptation process [Kühl17b].