Core Concepts
Exploiting spatiotemporal information for improved object co-localization in weakly-labeled videos.
Abstract
CoLo-CAM introduces a novel method for weakly supervised video object localization that leverages spatiotemporal information without constraining object movement. The method focuses on co-localization by jointly learning class activation maps across multiple frames, assuming objects maintain similar colors locally. By minimizing a color-only CRF loss over all frames, the method achieves consistent localization performance. Extensive experiments on challenging datasets demonstrate the effectiveness and robustness of CoLo-CAM, leading to state-of-the-art performance for weakly supervised video object localization tasks.
Stats
CAM activations are constrained to respond similarly over pixels with similar colors.
Extensive experiments show the merits of CoLo-CAM method.
The total training loss combines per-frame and multi-frame terms.