Core Concepts
The proposed method combines reference frame synthesis (RFS) and post-processing filter enhancement (PFE) to achieve space-time coupled enhancement for Versatile Video Coding (VVC), leveraging a single neural network (STENet) for both tasks.
Abstract
The paper presents a novel approach that jointly utilizes reference frame synthesis (RFS) and post-processing filter enhancement (PFE) to improve the performance of Versatile Video Coding (VVC).
Key highlights:
The Space-Time Enhancement Network (STENet) is introduced, which comprises a synthesis pipeline for RFS and an enhancement pipeline for PFE. STENet takes two compressed frames as input and generates an intermediate synthesized frame and two enhanced frames.
For RFS, STENet's synthesis pipeline synthesizes a virtual reference frame that is inserted into the reference picture lists to enhance inter prediction.
For PFE, STENet's enhancement pipeline filters the reconstructed frames to alleviate artifacts and distortions.
To reduce inference complexity, the authors propose joint inference of RFS and PFE (JISE), which executes both tasks simultaneously using a single STENet inference.
The proposed method is integrated into the VVC reference software VTM-15.0 and evaluated under the Random Access (RA) configuration. It achieves 7.34%/17.21%/16.65% PSNR-based BD-rate reduction on average for the Y/U/V components.
Ablation studies demonstrate the individual contributions of RFS, PFE, and the joint training strategy. RFS is the primary driver of performance improvement, while PFE and joint training also provide notable gains.
Stats
The proposed method achieves 4.08% average bitrate reduction and 0.29% average Y-PSNR improvement compared to VTM-15.0 under RA configuration.
Compared to VTM-15.0, the encoding and decoding time complexity of the proposed method increases by 272% and 261266% on average, respectively.
Quotes
"The proposed method could achieve -7.34%/-17.21%/-16.65% PSNR-based BD-rate on average for three components under RA configuration."
"RFS is the primary driver of performance improvement in our methodology and constitutes a substantial portion of the total runtime."