Student Theses - Details

Own Voice Estimation for Hearables using Sensor Fusion and Speech Enhancement Techniques

Supervisor:  Christoph Weyer, Johannes Fabry

Area: Sensor Fusion, Speech Enhancement

Tools: Matlab, Python, Estimation Theory, Signal Processing, Adaptive Filters

Categories: Bachelor Thesis, Master Thesis

Status: Open


In recent years, wireless bluetooth earbuds finally arrived at market for consumer electronics. These so called hearables have the potential to improve the hearing and listening experience of the wearer. The possibilities include selective listening, augmented reality, active noise cancellation and many more. However, these features require a multitude of different algorithms to be employed in real time in a computation and power constrained environment, thus posing challenging signal processing problems.

For multiple applications of hearables, a reliable estimate of the user's own voice is of great importance. This includes improved sound quality during phone calls, better speech recognition, but also a more natural perception of one's own voice while using the hearable, utilizing active occlusion cancellation.

To achieve good own voice estimation, multiple signals from the hearable can be used. To combine these signals of, possibly, different modalities, is a problem from the area of sensor fusion, which lies at the intersection of signal processing and estimation theory.

In this thesis, the problem shall be tackled using classical signal processing methods. Possible approaches include model based sensor fusion utilizing a Kalman or particle filter, or other adaptive filtering methods mainly known from the field of speech enhancement. Also, beamforming techniques could be applied.

As part of this thesis, different approaches shall be researched and analyzed for their suitability to solve the own voice estimation problem. One or multiple promising approaches shall be implemented. Thereafter, their performance shall be systematically evaluated based on real world example signals. The analysis should reflect the importance of computational constraints of the hearable platform.