Core Concepts
A deep ensemble learning framework that leverages both CNN and Transformer architectures to generate robust feature representations for occluded person re-identification.
Abstract
The paper proposes a deep ensemble learning framework for occluded person re-identification. The approach consists of two complementary models:
-
Context-based CNN Classifier:
- Utilizes Masked Autoencoder (MAE) reconstructed images to enhance the feature space and generate occlusion-robust global representations.
- Employs orthogonal fusion to combine discriminative global and local body part features, suppressing interference from occlusion.
- Uses sparse attention to further reduce noise in the MAE-enhanced feature space.
-
Part Occluded Token-based Transformer Classifier:
- Generates part-occluded tokens by masking body parts and selects the most discriminative ones using a CNN verifier.
- Concatenates the selected part-occluded tokens with the original image tokens and feeds them to a Transformer encoder.
- Performs classification using an MLP head.
The ensemble of these two models, named Orthogonal Fusion with Occlusion Handling (OFOH), achieves state-of-the-art performance on several occluded and holistic person re-identification datasets, including Occluded-REID, Occluded-Duke, Market-1501, and DukeMTMC-reID.
Stats
Occluded-REID dataset contains 2,000 images of 200 occluded persons, with 5 full-body and 5 occluded images per identity.
Occluded-Duke dataset has 15,618 training images of 720 people and 17,661 gallery and 2,210 query images of 1,100 people.
Market-1501 dataset consists of 12,936 training, 3,368 query, and 19,732 gallery images of 1,501 identities.
DukeMTMC-reID dataset contains 16,522 training, 17,661 gallery, and 2,228 query images of 1,404 identities.
PRAI-1581 dataset has 39,461 person images of 1,581 classes captured by UAV drones.
Quotes
"The key challenge in the occluded ReID problem is how to learn discriminative information from occluded data. Also, occluded images lack identity information, which is crucial for designing robust re-identification."
"Occlusion remains one of the major challenges in person reidentiϐication (ReID) as a result of the diversity of poses and the variation of appearances."