Colloquium - Details

Sie verwenden einen Browser, in dem JavaScript deaktiviert ist. Dadurch wird verhindert, dass Sie die volle Funktionalität dieser Webseite nutzen können. Zur Navigation müssen Sie daher die Sitemap nutzen.

You are currently using a browser with deactivated JavaScript. There you can't use all the features of this website. In order to navigate the site, please use the Sitemap .

You will receive information about presentations in time if you subscribe to the newsletter of the Colloquium Communications Technology.

All interested students are cordially invited, registration is not required.

Bachelor-Presentation: TinySilentSpeech: Advancing Silent Speech Interfaces for Edge Devices

Leon Hausmann
Tuesday, August 12, 2025
03:00 PM
IKS 4G | hybrid

Silent speech recognition using electromyographic (EMG) signals enables speech decoding without acoustic input, offering promise for discreet and assistive communication systems. However, the current state-of-the-art models, while accurate, are computationally intensive and unsuitable for real-time or edge-based deployment. This thesis investigates whether such models can be significantly compressed and optimized without sacrificing performance.

Starting from an established baseline, the original state-of-the-art Transformer-based EMG-to-speech model by Gaddy and Klein was first reproduced and then systematically compressed through architectural and quantization-aware methods. Topological changes reduce the model size by 70% with only a minor trade-off in accuracy. Applying sub-byte 4-bit Quantization-Aware Training (QAT) and orthogonality regularization further reduced the model size by 90%, while maintaining a relative degradation of just 2% in word error rate (WER). These steps also reduce the number of floating-point operations by over 70%, yielding substantial gains in inference efficiency.

Subsequently, the output targets were redefined: replacing Mel spectrograms with textual representations eliminated vocoder-induced errors and resulted in direct EMG-to-text models achieving a WER of 34%, clearly outperforming the full-sized baseline. Finally, replacing the Transformer with a Conformer architecture resulted in an additional 3% WER improvement while reducing the model size by a further 4%. The best-performing configuration is a quantized Conformer trained with text targets, maintained a 70.2% FLOPs reduction, achieved 33% WER, and required only 14 MB of storage (a 94% size reduction).

These results demonstrate that efficient and accurate EMG-based silent speech recognition is achievable through targeted compression, architectural refinement, and output optimization. The models developed in this work could be deployed on standard mobile and edge devices, creating opportunities for real-world silent speech applications in assistive technology and wearable interfaces.

back