Publications-Detail

Front-End Signal Processing for Far-Field Speech Communication

Authors:
Schrammen, M.
Ph. D. Dissertation
 
School:
IKS, RWTH Aachen University
Adress:
Templergraben 55, 52056 Aachen
Series:
Aachen Series on Communication Systems
Number:
2
Date:
2022
ISBN:
978-3-84408-809-0
DOI:
10.18154/RWTH-2022-09960
Language:
English

Abstract

Devices for speech communication operated in handsfree mode offer a very natural way of human communication, because the user can move freely in relation to the device. However, the signal-to-noise ratio (SNR) at the microphones of the device is typically low due to propagation loss, reverberation and interfering sounds such as echo or environmental noise. This requires appropriate front-end signal processing (FESP) to enhance the desired speech signal. Nowadays more than one communication device is typically present in a smart home environment or in a conference meeting room. A beamformer (BF) can use the microphones of multiple devices to compensate for the low initial SNR, if all microphone positions are known. For estimating these positions the novel orthogonal geometric projection (OGP) approach is proposed. OGP needs only two acoustic events like speech or hand claps for estimation and thus puts very low effort on the user. For allowing a full-duplex speech communication, one acoustic echo canceller (AEC) per microphone channel is usually employed prior to the BF, which results in a high complexity. Therefore, change prediction (ChaP) is proposed that enables the use of a single AEC after the BF. By collecting information on the acoustic system over time, ChaP can facilitate the adaptation of the AEC such that this low-complexity single-AEC configuration can approach the performance of the high-complexity multi-AEC variant. Conventional linear AEC is actually insufficient for mobile consumer devices, because their low-cost loudspeakers and amplifiers turned up to a high volume show a significant nonlinear behavior. The novel dual-stage multi-channel Kalman (DualStage-MCK) algorithm also compensates for these nonlinear effects and does not suffer from limited modelling capabilities, slow tracking or high computational complexity, which are typical drawbacks of state-of-the-art solutions. The performance of the proposed solutions is evaluated in typical use cases and on realistic test data that includes device-specific acoustic shadowing and nonlinear effects acquired from specifically manufactured tablet, smart speaker and smartphone mockups.

Download

BibTeX

Copyright © by IKS
schrammen22.pdf
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.