Understanding Video Transformers via Unsupervised Discovery of Spatiotemporal Concepts
This work introduces the first Video Transformer Concept Discovery (VTCD) algorithm to systematically identify and rank the importance of high-level, spatiotemporal concepts that underlie the decision-making process of video transformer models.