Colloquium - Details

You will receive information about presentations in time if you subscribe to the newsletter of the Colloquium Communications Technology.

All interested students are cordially invited, registration is not required.

Promotionsvortrag: Front-End Signal Processing for Far-Field Speech Communication

Matthias Schrammen
Freitag, 18. März 2022
10:00 Uhr
virtueller Konferenzraum

Devices for speech communication operated in hands-free mode offer a very natural way of human communication. The capturing device, e.g., a smartphone, smart speaker or tablet, is often located up to several meters away from the human speaker. Furthermore, detrimental effects like noise and reverberation are present in everyday acoustic environments. Therefore, the signal-to-noise ratio at the microphones mounted on the device is typically too low to offer sufficient speech quality for the listener at the other end of the communication link. In addition, the loudspeaker of the device is located much closer to the microphones than the human speaker. Therefore, a strong echo signal from the loudspeaker couples into the microphones degrading the conversation quality for the remote listener even further.

State-of-the-art approaches that tackle the above-mentioned problems usually rely on multiple microphones to improve the signal-to-noise ratio with methods like beamforming. Beamforming combines the digitally filtered signals of several microphones to obtain an enhanced speech signal at the output. In addition, acoustic echo cancellation is employed to attenuate the echo signal more specifically. This is achieved by adaptive estimation of a digital model of the acoustic echo path and subsequent subtraction of the synthesized echo signal from the microphone signal.

However, the solutions are usually optimized for one specific device and are only applicable when the positions of the microphones are fixed and known to the algorithm. Furthermore, the combination of multi-microphone enhancement and echo cancellation is not trivial and low complexity solutions are lacking performance in terms of tracking dynamic acoustic scenarios. Finally, low costs, small form factors, and high desired sound pressure levels result in loudspeakers that operate at their physical limits. This adds significant nonlinear components to the sound emitted by the loudspeaker. Therefore, conventional linear acoustic echo cancellation cannot compensate for the nonlinear parts of the echo and the conversational quality is not satisfactory.

The task of the dissertation is to alleviate these shortcomings. The developed signal processing algorithms should be more flexible with respect to desired features in real devices. Among these are microphone positions that are unknown or change during operation and the use of beamforming and acoustic echo cancellation at the same time. Furthermore, the developed solutions should be able to handle nonlinear echo paths and should introduce a low computational complexity to be attractive for battery-powered devices, too.