Khái niệm cốt lõi
Leveraging textual prompts and hybrid attention mechanisms to generate well-aligned part features for occluded person re-identification, while preserving pre-trained knowledge to improve generalization.
Tóm tắt
The paper proposes a Prompt-guided Feature Disentangling (ProFD) framework to address the challenges of occluded person re-identification. The key components are:
- Part-aware Knowledge Adaptation:
- Designs part-specific prompts to introduce rich semantic priors from CLIP and utilizes noisy segmentation masks to pre-align visual-textual modality at the spatial level.
- Prompt-guided Feature Disentangling:
- Introduces a hybrid-attention decoder that combines spatial-aware attention and semantic-aware attention to generate well-aligned part features, mitigating the impact of noisy spatial information.
- Applies a diversity loss to reduce redundancy between part features.
- Predicts visibility scores for each part feature to filter out features of occluded body parts during inference.
- General Knowledge Preservation:
- Employs a self-distillation strategy with global and local memory banks to avoid catastrophic forgetting of pre-trained CLIP knowledge during fine-tuning.
The proposed ProFD framework is evaluated on both holistic and occluded person re-identification datasets, demonstrating state-of-the-art performance, especially on challenging occluded datasets.
Thống kê
Occluded-Duke dataset: Rank-1 accuracy of 70.8% and mAP of 62.8%
Occluded-ReID dataset: Rank-1 accuracy of 91.1% and mAP of 88.5%
P-DukeMTMC dataset: Rank-1 accuracy of 91.7% and mAP of 83.7%
Trích dẫn
"To reduce the impact brought by the missing information and noisy label problems, we propose a Prompt-guided Feature Disentangling framework (ProFD)."
"By incorporating the rich pre-trained knowledge of textual modality, our framework helps the model accurately capture well-aligned part features of the human body."
"Owing to introduce textual modality and self-distillation strategy, ProFD demonstrates strong generalization capabilities, significantly outperforming other methods on the Occluded-ReID dataset [23], with improvements of at least 8.3% in mAP and 4.8% in Rank-1 accuracy."