Colloquium - Details
You will receive information about presentations in time if you subscribe to the newsletter of the Colloquium Communications Technology.
Master-Vortrag: Noise Reduction as Preprocessing for Automatic Speech Recognizers
Tobias Menne
1. Juni 2015
09:00 Uhr
Hörsaal 4G IKS
Automatic speech recognition is a well established research topic. Advancements in the field have been made in the recent past, for example by introducing deep neural networks for acoustic modeling. It has also become a viable consumer product as it can be seen among others at the examples of Siri or the Google speech search function. Nevertheless the performance of such automatic speech recognition tools is still extremely vulnerable to background noise. This is why current research efforts concentrate on creating robust speech recognition systems. Different approaches have been investigated for this purpose, which focus on different stages in the recognition process. One example is the development of more robust acoustic models. This work is concentrating on the effect of speech enhancement algorithms on the recognition performance of a state of the art speech recognition system, trained on clean signals, in different noise environments. Thereby a focus is put on a recently proposed speech enhancement algorithm based on a regression approach using deep neural networks.
Elementary investigations on the effect of speech enhancement algorithms are conducted. It is shown, that the effect of speech enhancement algorithms on the recognition performance has to be investigated separately from their effect on the human listening performance. This has been done by comparing the change of the recognition performance to the change of the objective quality measures PESQ and STOI. It is further shown, that the algorithm based on deep neural networks outperforms well established noise reduction algorithms in simple noise scenarios. Based on this observation the regression network’s effect in more complex noise scenarios containing multiple different noise types is investigated. The experiments also show, that the effect on the recognition performance can be further improved by adding additional information to the input of the neural network as for example a speech estimate of a conventional speech enhancement algorithm.
Finally an approach of using neural networks as a statistical model for noise reduction as preprocessing for automatic speech recognition is investigated.
