High Quality Video Conferencing
Audio and video signal processing for high‐quality multi‐point video conferencing implies strict real‐time processing and transmission constraints as well as bit rate limitations on the overall system design and implementation. Within the collaborative Ziel 2 research project „Connected Visual Reality” a new conference system has been developed achieving high presentation quality as well as high flexibility with respect to room set‐ups, clients, and network configurations. A key element is a newly developed multimodal signal processing concept for speaker localization and activity estimation. Furthermore, the use of sophisticated coding and signal enhancement techniques as well as new features such as artificial bandwidth extension enables implementation with cost‐efficient consumer electronics instead of specialized conference room installations.
Our technical focus was on a new multimodal signal processing concept applied for identification of the most active talkers in a video conference system, even among competing talkers in a single room. The new interacting audio and video analysis scheme consists of dedicated beamformer‐driven speaker activity estimation in combination with face detection and tracking. Complementary information from both, audio and video signals, is exchanged, merged, and transmitted via metadata. The proposed multimodal signal processing concept enables an automatic audio‐visual scene composition at the receiver side, where the most active talkers are arranged and displayed side by side for an enhanced conversational experience. In contrast to other, commercially available high quality solutions this system has been intentionally designed for off‐the shelf consumer electronics at low cost. The developed conference system has been validated by a real‐time prototype implementation. It was successfully demonstrated at CeBIT and other scientific conferences.
Under Construction ...