The paper introduces Open-Vocabulary Video Anomaly Detection (OVVAD) to address limitations in traditional video anomaly detection approaches. It proposes a model that leverages pre-trained large models to detect and categorize both seen and unseen anomalies. By disentangling OVVAD into class-agnostic detection and class-specific classification tasks, the model optimizes performance on widely-used benchmarks. The inclusion of modules like Temporal Adapter, Semantic Knowledge Injection, and Novel Anomaly Synthesis significantly improves detection capabilities for both base and novel anomalies. Extensive experiments demonstrate state-of-the-art performance on UCF-Crime, XD-Violence, and UBnormal datasets.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Peng Wu,Xuer... at arxiv.org 03-14-2024
https://arxiv.org/pdf/2311.07042.pdfDeeper Inquiries