Publications-Detail

Using Perceptual Evaluation of Speech Quality (PESQ) Loss for DNN-Based Speech Enhancement

Authors:
Thieling, L. ,  Nippert, L. ,  Jax, P.
Book Title:
ITG-Fachtagung Sprachkommunikation
Organization:
VDE
Publisher:
VDE
Pages:
p.p. 61-65
Date:
Sep. 2023
ISBN:
978-3-80076-164-7
DOI:
10.30420/456164011
Language:
English

Abstract

In deep neural network (DNN)-based speech enhancement ap- proaches, standard regression losses such as the mean squared error (MSE) are often utilized for training. However, these losses typically do not consider human perception and therefore may not lead to good perceptual quality. In this work, we implement a PESQLoss function that approximates the popular perceptual eval- uation of speech quality (PESQ) metric. We propose modifications to our existing phase-aware deep speech enhancement approach that enable joint optimization of magnitude and phase estimates using this PESQLoss. By varying the weight of the PESQLoss as an additional term in our total loss, we investigate its influence on the achieved evaluation metrics. Moreover, we present a sup- pression measure allowing better interpretation of its influence on the estimation results. Our experiments show that the proposed changes for joint optimization lead to an average improvement of about 0.28 MOS w.r.t. PESQ, while achieving similar results for the other metrics (STOI, segmental SNR, DNSMOS).

Download

BibTeX