Termin – Detailansicht

Sie verwenden einen Browser, in dem JavaScript deaktiviert ist. Dadurch wird verhindert, dass Sie die volle Funktionalität dieser Webseite nutzen können. Zur Navigation müssen Sie daher die Sitemap nutzen.

You are currently using a browser with deactivated JavaScript. There you can't use all the features of this website. In order to navigate the site, please use the Sitemap .

Master-Vortrag: Investigations on Generative Models for Head-Related Transfer Functions

Serhat Kurt
Donnerstag, 21. Mai 2026
11:15 Uhr
IKS 4G | zoom

Head-Related Transfer Functions (HRTFs) are fundamental for binaural hearing and sound localization. They depend heavily on both the source position and the unique physical characteristics of the subject. To realize immersive spatial audio applications, accurate models of a subject’s HRTFs are required. However, measuring HRTFs in a laboratory setting remains challenging due to the requirement for complex, specialized setups. Consequently, deep generative models offer a promising alternative for finding the HRTF of a subject.

In this thesis, we develop a generative model for HRTFs based on a Conditional Variational Autoencoder (C-VAE). The main idea is that, since HRTFs depend on both subject characteristics and the Direction of Arrival (DoA), if we explicitly provide the DoA information to the generative model, it should only encode the subject characteristics in the latent space. To enhance this model, we propose an adversarial training pipeline to obtain latent representations that are independent of the DoA. The entire pipeline consists of two fundamental models: a C-VAE and a discriminator network. This idea is inspired by Generative Adversarial Networks (GANs).

The discriminator network is trained to reconstruct the DoA from the latent space representations. Conversely, the C-VAE is designed to accurately reconstruct HRTFs from the latent space while maximizing the discriminator’s error. Joint training of both networks enables the correct reconstruction of the HRTF while ensuring that the latent representations are independent of the DoA.

This research consists of the optimization and evaluation of the C-VAE and the discriminator networks. Experiments were conducted on the SONICOM dataset. The results demonstrate the efficacy of the proposed methodology, as the Mutual Information (MI) between the latent representations and the DoA is significantly reduced through adversarial training compared to models trained without adversarial guidance.

zurück