Core Concepts
Self-supervised contrastive learning can be enhanced by incorporating local pivotal regions through a novel pretext task called Local Discrimination (LoDisc), leading to improved fine-grained visual recognition.
Abstract
Self-supervised contrastive learning focuses on global features, insufficient for fine-grained recognition.
LoDisc introduces a local discrimination pretext task to emphasize important local regions.
A global-local framework refines feature representations for improved recognition.
Extensive experiments show significant improvements in fine-grained and general object recognition tasks.
Attention maps visualization demonstrates the effectiveness of the proposed method.
Stats
"The proposed method achieves 5.64% Top-1 accuracy higher than our baseline method (MoCo v3 [10]) on FGVC-Aircraft and 12.83% Top-1 accuracy higher than recent state-of-the-art self-supervised contrastive method designed for FGVR on Stanford Cars."
"The Top-1, Top-5, Rank-1, Rank-5 and mAP of the global-local method are 79.38%, 95.27%, 72.36%, 87.90% and 54.75%, respectively, which is 20.75%, 13.44%, 14.20%, 11.15%, and 11.96% higher than the global method of MoCo v3."
Quotes
"The proposed method can lead to a decent improvement in different fine-grained object recognition tasks."
"The proposed method is also effective in general object recognition tasks."