Główne pojęcia
A novel self-distillation framework, SDPose, that leverages a Multi-Cycled Transformer (MCT) module to improve the performance of small transformer-based human pose estimation models without increasing computational cost.
Streszczenie
The paper introduces a novel self-distillation framework, SDPose, for efficient transformer-based human pose estimation. The key components are:
-
Multi-Cycled Transformer (MCT) Module:
- Passes the tokenized features through the transformer layers multiple times during inference.
- Increases the "latent depth" of the transformer network without adding extra parameters.
- Allows the model parameters to be learned more fully for better performance.
-
Self-Distillation Scheme:
- During training, the outputs from different cycles in the MCT module are used to distill the outputs from previous cycles.
- This extracts the knowledge from the complete MCT inference into a single-pass model.
- Maintains the original inference computation without incurring additional cost.
The authors apply their SDPose framework to various transformer-based human pose estimation models, including TokenPose and DistilPose. Experiments on the MSCOCO and Crowdpose datasets show that SDPose achieves state-of-the-art performance among small-scale models, with significant improvements over the base models under the same computational budget.
Statystyki
SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs.
SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset with 6.2M parameters and 4.7 GFLOPs.
Cytaty
"To mitigate the problem of under-fitting, we design a transformer module named Multi-Cycled Transformer(MCT) based on multiple-cycled forwards to more fully exploit the potential of small model parameters."
"Further, in order to prevent the additional inference compute-consuming brought by MCT, we introduce a self-distillation scheme, extracting the knowledge from the MCT module to a naive forward model."