Leveraging Text-Image Alignment and Temporal Adaptivity for Weakly Supervised Video Anomaly Detection
A novel pseudo-label generation and self-training framework that utilizes text-image alignment capabilities of CLIP and adaptive temporal modeling to achieve state-of-the-art performance on video anomaly detection tasks.