核心概念
The author presents a novel self-supervised global-local fine-grained contrastive learning framework to enhance feature representations at both global and local levels.
要約
The paper introduces the Local Discrimination (LoDisc) pretext task to focus on important local regions, improving fine-grained visual recognition. Experimental results show significant improvements in various object recognition tasks.
The proposed method combines global and local branches to refine feature representations, achieving state-of-the-art performance in classification and retrieval tasks. Attention maps demonstrate the model's ability to discern discriminative features within pivotal regions of objects.
統計
The proposed method achieves 5.64% Top-1 accuracy higher than the baseline method on FGVC-Aircraft.
The proposed method achieves 12.83% Top-1 accuracy higher than recent state-of-the-art self-supervised contrastive methods on Stanford Cars.