Colloquium - Details
You will receive information about presentations in time if you subscribe to the newsletter of the Colloquium Communications Technology.
Master-Presentation: Head Movement Compensation for Binaural Microphone Recordings
Kenan Linden
Tuesday, November 25, 2025
11:00 AM
IKS 4G | zoom
Human spatial perception relies strongly on the auditory system, which enables listeners to localize sound sources and perceive them as stable in space, even during head movements. Achieving a realistic auditory impression, including this perceptual stability, is essential for acoustic augmented and virtual reality. A common approach to recreate realistic spatial impressions is binaural reproduction, in which signals are presented to both ears so that they contain the same directional and room-related cues as in natural hearing. Existing Binaural Cue Adaptation (BCA) algorithms compensate for listener head rotations during playback but assume that the original recording was made under static conditions. This thesis introduces a separate, independent Binaural Cue Compensation (BCC) algorithm that stabilizes the auditory scene of binaural recordings captured with a moving recording head.
The BCC algorithm is derived based on the same principles as the BCA and is formulated as an head-related transfer function (HRTF)-exchange filter in the time-frequency domain. To overcome the limitations of the original direction of arrival estimation used in the BCA, a movement- invariant beamforming is employed. The resulting system achieves perfect compensation under ideal conditions with single, stationary, and coherent sources, confirming the fundamental feasibility of the approach. However, evaluations under realistic, reverberant conditions reveal that incoherent sound components lead to implausible ambient reproduction. To address this, a signal model that accounts for both coherent and incoherent sound components is used, resulting in the multichannel coherence-adaptive BCC filter. The underlying primary-ambient decomposition estimation, used previously in the BCA, is adapted in this thesis to maintain reliable decomposition under dynamic conditions. Two modified variants, the ambient-weighted primary-ambient decomposition (AWPAD) and the RTF-table primary-ambient decomposition, are proposed to improve the estimations robustness and performance under dynamic input conditions.
A listening experiment confirms that the multichannel coherence-adaptive BCC algorithm using the AWPAD and movement-invariant beamforming estimations can plausibly stabilize binaural recordings for moderate head rotations, while larger or faster movements still reduce perceptual plausibility. The results demonstrate the potential of the proposed method for stabilizing dynamic binaural recordings and highlight the remaining challenges toward achieving fully stable reproduction in complex and reverberant environments.
