Colloquium - Details
You will receive information about presentations in time if you subscribe to the newsletter of the Colloquium Communications Technology.
Master-Presentation: Spatial Upscaling of Higher-Order Ambisonics Signals Using Machine Learning
Christian Johannes Bruno Schulz
Monday, June 23, 2025
02:00 PM
IKS 4G | zoom
Ambisonics is a widely adopted spatial audio format that enables the capture, processing, and playback of three-dimensional sound fields. It has a wide array of applications such as immersive virtual reality (VR) and advanced teleconferencing. Recent technological advancements in multi-channel audio systems have made the transition from Ambisonics to higher-order Ambisonics (HOA) possible, which provides improved spatial resolution and a more immersive listening experience. However, the HOA order, which determines spatial accuracy, is often limited by hardware constraints, such as the number of available microphones or loudspeakers in the recording or reproduction setup, respectively. As a result, much of the existing Ambisonics-based sound field information can only be obtained in lower orders, which motivates many methods that aim to enhance the spatial detail of these signals. This thesis investigates the use of machine learning techniques for artificially increasing HOA signal orders to enhance spatial resolution of Ambisonics signals. The primary objective is to develop data-driven models that are capable of inferring higher-order spatial information from lower-order Ambisonics signals, thereby improving spatial fidelity without requiring additional recording equipment. For this purpose, several neural network architectures are explored and trained. In the time-domain, both fully connected (FC) networks and gated recurrent units (GRUs) are tested. In the time-frequency domain, the concept of sparse subband networks that process one subband at a time is introduced. The proposed neural network architectures are evaluated using two quantitative performance metrics. Spatial similarity, a well-established metric, is employed to evaluate the spatial fidelity between different HOA signals. In addition, this thesis introduces a novel approach for estimating the effective HOA order based on the normalized reconstruction error (NRE). Simulation results demonstrate that adopting a sparse network structure enhances model performance. The resulting networks exhibit reduced complexity and require less training data, while simultaneously surpassing the performance of dense network counterparts. Among the evaluated models, the time-frequency domain sparse subband networks achieve superior overall performance, including enhanced generalization capabilities. These findings provide insight into both the potential and current limitations of data-driven upscaling techniques, as determined by appropriate evaluation criteria. Overall, the proposed models demonstrate significant utility in estimating HOA signals in scenarios where obtaining actual HOA recordings is impractical.
