Spatial Audio Acquisition & Processing
Spatial Audio Processing refers to the processing of multiple signals which are a representation of observations of a sound field. It comprises the sampling of the field, the mathematical signal representation, the analysis, modification and synthesis of signals, and perceptual aspects of aural localization. At IKS, it is our vision to adapt our profound knowledge in adaptive signal processing to the emerging field of spatial audio.
Spatial Audio Acquisition & Ambisonics
The challenge in sampling the sound field is to obtain as much information as needed while using as few as possible sensors. This is only possible by means of appropriate room models incorporating the acoustic wave equation and statistical models. Thus, spatial audio acquisition goes hand in hand with spatial audio representation.
Simple geometric sensor arrangements allow for a systematic and analytic description of the spatial sound characteristics. Therefore, often linear, circular or spherical arrays are used. This is closely related to Beamforming. An Eigenmike® with 32 microphone capsules uniformly distributed on a rigid sphere—see the picture below—belongs to the IKS technical equipment. [Meyer & Elko 2002]
The great advantage of spherical setups is the rotational invariance. The sound recorded using a spherical microphone array can easily be transformed into a domain which makes the sound field representation independent of the actual microphone setup. This format is known as Ambisonics. The basic idea of Ambisonics is attributable to Michael Gerzon but was not substantially impelled until the extension of first order B-Format to Higher Order Ambisonics(HOA) in the mid 90s due to researchers such as Jérôme Daniel amongst others. [Daniel, Nicol & Moreau 2003]
The idea behind Ambisonics is to describe the sound pressure distribution on the surface of an imaginary sphere around a reference point in the sound field. Since the sound wave propagation underlies the principles of the wave equation, the knowledge of the pressure on the surface is sufficient to uniquely determine the pressure in the entire solenoidal interior of the sphere. In mathematical terms this is denoted as the Kirchhoff-Helmholtz Integral [Daniel, Nicol & Moreau 2003]. The simple spherical geometry allows for a implicit solution of the integral.
Spherical Harmonics are used as a set of basis functions in order to transform the pressure distribution on the surface of the sphere into a more abstract representation. These basis functions have an angular dependency. An arbitrary pressure distribution on the sphere can be described as a linear combination of these basis functions. [Rafaely 2015] The first Spherical Harmonic functions up to order 3 are illustrated in the figure below with the radius indicating the functions absolute value per angle and the color indicating the sign. It is noteworthy that the coefficients without elevational dependency are identical to an azimuthal Fourier transform. In analogy to the Fourier transform, the higher the order is, the finer the represented structures on the surface of the sphere are.
For a computational representation of the pressure distribution on the sphere, the surface has to be sampled (e.g., realized with a spherical microphone array). This leads to a band limitation, i.e., a limitation of the order of the Spherical Harmonics; the pressure distribution is approximatively described by a truncated series expansion.
Due to the order limitation, the sound field within the sphere is not perfectly described anymore. The fewer coefficients are used, the representation’s fidelity shrinks to a smaller region around the center of the sphere. [Rafaely 2015] This is in analogy to other truncated series expansions (as for example the Taylor approximation) where the truncation order determines how detailed the surrounding of the reference point is described.
Signal Processing in the Ambisonics Domain
Although Ambisonics are an emerging technology for spatial audio representation, signal processing in the HOA domain beyond spatial audio coding is largely unexplored. Our long-standing expertise in adaptive signal processing at our institute gives us the possibility to drive research in spatial audio processing forward in the future. Open problems are manifold as shown by the following short selection of teasers:
- Can adaptive signal processing be used to enlarge the listening sweet spot in spatial audio reproduction?
- Does certain a priori information allow to increase the HOA order without increasing the number of sensors when recording spatial audio?
- Is it possible to enhance spatial audio recordings by means of classical signal enhancement algorithms?
- How can spatial information be used to improve audio scene classification and blind source separation?
Reproduction and Perceptual Aspects of Spatial Audio
The reason why spatial audio is perceptual relevant is that the human auditory system is capable of localizing acoustic sources. This is achieved by evaluating the inter-aural time and level differences between the two ears and signal variations introduced by slightly changing the head position and orientation.[Blauert 1997] Thus, immersive auralization is achieved when the sound pressure at the ears is reconstructed in a physically meaningful way. Higher Order Ambisonics signals have to be processed in such a way that arbitrary signals can be applied to the ears by means of the following methods:
- the use of headphones (binaural sound),
- the use of loudspeakers in combination with algorithms for cross talk cancelation (transaural sound), or
- the reconstruction of the entire sound field around the head.
For research in audio reproduction, the IKS has installed a listening studio—the IKS|Lab—with 36 loudspeakers that enables to assess spatial reproduction under ITU-R BS.1116-2 reference conditions.
Daniel, J., Nicol, R. and Moreau, S.:
Further Investigations of High Order Ambisonic and Wavefield Synthesis for Holophonic Sound Imaging,
AES 114th Convention, 2003
Fundamentals of Spherical Array Processing,
Spatial Hearing: The Psychophysics of Human Sound Localization,
MIT Press, 1997
Meyer, J. and Elko, G.:
A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield,
in: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002