Transformer-based Fusion for Distracted Driver Action Recognition
The author proposes a transformer-based fusion architecture to combine 2D-pose and spatio-temporal features for distracted driver action recognition, achieving an overlap score of 0.5079 on the A2 test set.