Multi-channel audio formats are established in the entertainment area. However, most of theexisting audio material is available in stereo only. Stereo systems have very limited abilities toreproduce spatial audio. In this regard, surround systems with 5, 7 or more loudspeakers performmuch better, which results in a higher degree of immersion.
Upmix technologies are applied to stereo signals to make use of the advantages of multi channelsystems in the presentation of spatial audio. In doing so, the focus is on the subjective listeningexperience. Hence, the objective of the upmixing process is not to physically model a sound field,but to generate a pleasant, immersive sound experience.
While traditional methods only distribute the existing stereo channels to multiple channels using afixed mixing matrix, in modern appraoches the stereo signal is analyzed with regard to its spatialinformation. The analysis is based on the assumption that the stereo signal is composed of acoherent primary part, which contains directed point-like sound sources (voices, instruments), andan ambient part, which contains undirected diffuse sound events (late reverberations, sounds of thesea, diffuse background noise). In order to obtain estimates of these two signal parts, Primary-Ambient Extraction (PAE) is used.
Rendering of the Primary Part
The extracted signal parts can be processed in regard to the target audio platform. For the primarypart, the directions of the contained sound sources need to be estimated. Applicable features areinter-channel level differences and inter-channel time delay differences. Given a robust directionestimate, the primary sound sources can be mapped to an arc and reconstructed, using an arbitraryplayback setup. In case of multi-channel loudspeaker setups, rendering techniques such as Vector Based Amplitude Panning (VBAP) can be used. For binaural playback, methods based on artificialheads are suitable.
Rendering of the Ambient Part
The ambient part is meant to create a preferably undirected impression during playback. For thisreason the estimated ambient signals are multiplied using decorrelation filter techniques andrendered using as many loudspeakers as possible.
An exact separation of primary and ambient part is crucial for the described upmixing procedure.Yet, complex soundscapes constitute a problem for state-of-the-art-techniques and can lead tounstable primary source positions and artifacts.