Termin – Detailansicht
Master-Vortrag: Investigations on Generative Models for Head-Related Transfer Functions
Serhat Kurt
Donnerstag, 21. Mai 2026
11:15 Uhr
IKS 4G | zoom
Head-Related Transfer Functions (HRTFs) are fundamental for binaural hearing and sound localization. They depend heavily on both the source position and the unique physical characteristics of the subject. To realize immersive spatial audio applications, accurate models of a subject’s HRTFs are required. However, measuring HRTFs in a laboratory setting remains challenging due to the requirement for complex, specialized setups. Consequently, deep generative models offer a promising alternative for finding the HRTF of a subject.
In this thesis, we develop a generative model for HRTFs based on a Conditional Variational Autoencoder (C-VAE). The main idea is that, since HRTFs depend on both subject characteristics and the Direction of Arrival (DoA), if we explicitly provide the DoA information to the generative model, it should only encode the subject characteristics in the latent space. To enhance this model, we propose an adversarial training pipeline to obtain latent representations that are independent of the DoA. The entire pipeline consists of two fundamental models: a C-VAE and a discriminator network. This idea is inspired by Generative Adversarial Networks (GANs).
The discriminator network is trained to reconstruct the DoA from the latent space representations. Conversely, the C-VAE is designed to accurately reconstruct HRTFs from the latent space while maximizing the discriminator’s error. Joint training of both networks enables the correct reconstruction of the HRTF while ensuring that the latent representations are independent of the DoA.
This research consists of the optimization and evaluation of the C-VAE and the discriminator networks. Experiments were conducted on the SONICOM dataset. The results demonstrate the efficacy of the proposed methodology, as the Mutual Information (MI) between the latent representations and the DoA is significantly reduced through adversarial training compared to models trained without adversarial guidance.
