Colloquium - Details

You will receive information about presentations in time if you subscribe to the newsletter of the Colloquium Communications Technology.

All interested students are cordially invited, registration is not required.

Master-Vortrag: Investigations on Perceptual Losses for Deep Learning-Based Speech Enhancement

Lars Nippert
Mittwoch, 19. April 2023

14:00 Uhr

Speech enhancement aims to improve the intelligibility and perceptive quality of a degraded speech signal. It has important applications including telecommunications, hearing aids, hearables and speech recognition. The task is particularly challenging in the presence of large distortions. While the rise of deep neural networks has led to new state-of-the-art models in the field, they still often suffer from bad perceptual quality due to the usage of naive standard regression loss functions like the mean squared error which do not inevitably align with human perception. One approach in the search for perceptual loss functions is to develop loss functions that are based on perceptual metrics. By analyzing the relationship between perceptual metrics and loss functions, we gain a better understanding of test results. We develop a loss function that approximates the popular Perceptual Evaluation of Speech Quality (PESQ) metric for the denoising task. It turns out that training with this loss function alone is problematic. By successively removing components from the function, we analyze their effect. A common dilemma is that models remove either too little noise or too much speech. Using a measure to quantify the amount of suppression, different loss functions are evaluated throughout our experiments. We propose a loss that adapts itself to find a balance between suppressing and retaining information. Since the perception of speech quality can be subjective, we propose a method for training a model on a parameterized loss that allows the influence of a loss term to be controlled at inference time.