核心概念
A novel approach that jointly improves the memory matching and decoding stages to alleviate the false matching issue in video object segmentation.
摘要
The paper proposes a method called Jointly Improve Matching and Decoding (JIMD) that jointly enhances the memory matching and decoding stages to address the false matching problem in video object segmentation (VOS).
For the memory matching stage:
- Cost-aware matching is introduced for short-term memory to better capture the fine-grained variations between adjacent frames.
- Cross-scale matching is proposed for long-term memory to effectively handle objects of different scales.
For the readout decoding stage:
- A compensatory decoding mechanism is introduced, which consists of pre-decoding, context embedding, and post-decoding. This helps suppress false matches and recover crucial information lost in the initial memory readout.
The joint improvement of the matching and decoding stages leads to significant performance gains on popular VOS benchmarks, outperforming state-of-the-art methods. Extensive ablation studies demonstrate the effectiveness of the individual components.
統計資料
The proposed JIMD method achieves 83.9% J&F score on the DAVIS 2017 Test set, outperforming the previous state-of-the-art method by 2.9%.
On the DAVIS 2017 Validation set, JIMD achieves 88.1% J&F, a 1.9% improvement over the previous best.
JIMD also achieves excellent results on the YouTubeVOS 2018 and 2019 Validation sets, reaching 84.8% and 84.6% respectively.
引述
"Memory matching essentially relates to the accuracy in generating the target object masks, which becomes a crucial component in improving the accuracy of VOS tasks."
"We argue that suppressing false matches requires improving memory matching and improving decoding process."
"Our paper aims to give a more suitable and comprehensive answer, which jointly improves both stages and rethinks all details toward reducing the false matching instead of the simple foreground-background distinction."