核心概念
A novel sandwiched video compression scheme that upgrades a 2D video codec to efficiently compress and deliver stereo RGB-D video for stereoscopic teleconferencing, achieving 29.3% bit-rate saving over existing solutions while maintaining the same level of rendering quality.
要約
The paper proposes a novel sandwiched video compression scheme for stereo RGB-D video streaming. The key idea is to wrap a standard video codec (e.g. H.264, HEVC) with a pair of neural network-based pre- and post-processors.
The preprocessor takes the stereo RGB-D input, transforms the depth maps to a canonical coordinate system, and generates neural codes that can be efficiently compressed by the video codec. The postprocessor then reconstructs the color and depth from the compressed neural codes for novel view rendering.
The authors develop several techniques to improve the compression performance:
- Joint processing of color and geometry to enable smart bit-allocation and cross-view/cross-modality redundancy reduction.
- A disparity warping-based distortion loss function to improve the depth quality in the context of rendering.
- Transforming depth maps to world-space coordinates to facilitate stereo alignment and improve the rate-distortion performance.
Experiments on both synthetic and real-captured datasets show that the proposed scheme can achieve 29.3% bit-rate saving over existing solutions (H.264 simulcast, MV-HEVC) at the same level of rendering quality. The authors also demonstrate the generalization ability of the pre- and post-processors to work with different video codecs.
統計
Our method reduces the bit-rate over existing solutions by 29.3% while maintaining the same level of rendering quality.
Transforming depth maps to world-space coordinates helps improve the rate-distortion performance, especially at higher bit-rates.
Jointly processing color and geometry is important for stereo RGB-D compression, as it outperforms methods that handle them separately.
引用
"Our stereo RGB-D video codec has the following advantages: 1) We lift the complex compression workload onto an optimally engineered video codec, which is usually efficiently implemented in hardware. It keeps our system efficient while offering great compression ratio. 2) The neural processor pair learns to conduct smart bit-allocation and redundancy reduction, greatly alleviating the bandwidth pressure; 3) In contrast to existing standardized multi-view and 3D video codecs [24], our method does not require further changes made to the hardware implementation."
"Experimental results show that our method reduces the bit-rate over existing solutions by 29.3%, while maintaining the same level of rendering quality."