Kolloquium - Details zum Vortrag

Sie werden über Vorträge rechtzeitig per E-Mail informiert werden, wenn Sie den Newsletter des kommunikationstechnischen Kolloquiums abonnieren.

Alle Interessierten sind herzlich eingeladen, eine Anmeldung ist nicht erforderlich.

Master-Vortrag: Noise Reduction Combining Conventional Approaches and Artificial Neural Networks

Lorenz Schmidt
Montag, 16. November 2020
11:00 Uhr
virtueller Konferenzraum

The suppression of noise for single channel speech enhancement is one of the most prominent challenges in signal processing and has been addressed for decades. In recent years, the popularization of Machine Learning algorithms and advances in deep neural network (DNN) architectures have opened new perspectives and approaches to this field, yielding impressive results. Many of these algorithms, however, require a computational effort that exceeds the available resources of a real-time application. One approach, called RNNoise, combines the methods of classical signal processing with DNNs. Within a common noise reduction architecture, difficult-to-tune parts are replaced by DNNs. By using knowledge of psychoacustics, a small weighting mask is sufficient to achieve impressive results. The mask is estimated by a very small neural network with a low computational complexity.

In this thesis, RNNoise is subject to several modifications that are intended to improve its denoising performance, while maintaining its affordable complexity. In a first step, the gated recurrent units (GRUs) of the RNNoise architecture are replaced by simple recurrent units (SRUs), which improve its performance while speeding up the training process. The DNN is expanded to estimate the pitch frequency, which is used in the reconstruction of the harmonics with a comb filter. A new binary IIR comb filter is developed and added to the signal processing of RNNoise. Besides the modifications of RNNoise itself, a pitch estimator, based on ordinary regression, and a mutual information metric are developed. The evaluation shows a good performance for pitch estimation and voice activity detection (VAD). A preliminary study analyzes the upper limits, which can be achieved by the reduced spectral weighting mask. With bark scaling, 22 gains are a reasonable tradeoff between performance and complexity. Then, a theoretical evaluation shows that the new network architecture improves the estimation considerably, especially in non-stationary noise situations. A final evaluation compares a classical noise suppression method, an end-to-end neural network approach, classical RNNoise and the improved model by means of their cepstral distances, speech-to-noise enhancement and perceptual measures. The results show that the modifications give the new architecture an edge over classical RNNoise. On the other hand, the developed IIR binary comb filter falls back in the expectation and does not improve noise suppression performance.

zurück