Publications-Detail

Generally Applicable Deep Speech Inpainting Using the Example of Bandwidth Extension

Authors:
Thieling, L.Jax, P.
Book Title:
Proceedings of European Signal Processing Conference (EUSIPCO)
Organization:
EURASIP
Pages:
p.p. 451-455
Date:
Aug. 2021
ISBN:
978-9-08279-706-0
ISSN:
2076-1465
DOI:
10.23919/EUSIPCO54536.2021.9616099
Language:
English

Abstract

Most of today's speech enhancement algorithms try to improve the quality or intelligibility of speech by modifying its time-frequency (TF) representation. It is often the case that individual parts of this TF plane become unusable due to severe disturbances or are even missing due to data loss. Here, we present a generally applicable speech inpainting algorithm to reconstruct the unusable or missing parts of the speech's TF representation in these cases. For the generalizability, we propose a statistically based error model that we use to train deep neural networks (DNNs). In order to minimize the complexity of this overall algorithm and still be able to achieve good results, we have trained the DNNs on the basis of mel frequency cepstral coefficients (MFCCs), which are designed based on the human auditory system. Our experimental results show that the proposed algorithm is well suited in reconstructing even very large unusable or missing TF parts. Using the example of artifical bandwidth extension (BWE), we demonstrate that our proposed way of training DNNs on random rectangular gaps or holes in the TF plane leads to a generally applicable solution for various specific problems in the speech processing domain.

Download

BibTeX