Improving Robustness of Open-Vocabulary Action Recognition by Denoising Noisy Class Descriptions
Existing open-vocabulary action recognition methods are highly sensitive to noisy class descriptions, which are common in real-world scenarios. To address this issue, the paper proposes the DENOISER framework that jointly optimizes text denoising and action recognition in an iterative manner, leveraging both textual and visual information.