核心概念
A novel method for generating generic spatio-temporal pseudo-anomalies by inpainting masked regions using a pre-trained latent diffusion model and perturbing optical flow using mixup. This unified framework measures reconstruction quality, temporal irregularity, and semantic inconsistency to effectively detect real-world anomalies.
摘要
The content presents a novel approach for video anomaly detection (VAD) that focuses on generating generic spatio-temporal pseudo-anomalies (PAs) to train a reconstruction-based model. The key highlights are:
- Spatial PAs are generated by inpainting masked regions in frames using a pre-trained latent diffusion model, while temporal PAs are created by perturbing optical flow using mixup augmentation.
- A unified VAD framework is introduced that measures three types of anomaly indicators: reconstruction quality, temporal irregularity, and semantic inconsistency.
- Extensive experiments on four VAD benchmark datasets (Ped2, Avenue, ShanghaiTech, and UBnormal) show that the proposed method achieves on-par performance with other state-of-the-art PA generation and reconstruction-based methods under the one-class classification setting.
- The analysis examines the transferability and generalization of the generated PAs across datasets, offering valuable insights into the detection of real-world anomalies.
統計資料
"10 log10 (M^2 / (1/R||x̂t - xt||2^2))" - Equation for normalized Peak Signal to Noise Ratio (PSNR) between input frame xt and its reconstruction x̂t.
"1/R'||ϕ̂(xt, x(t+1)) - ϕ(xt, x(t+1))||2^2" - Equation for normalized L2 loss between input optical flow ϕ(xt, x(t+1)) and its reconstruction ϕ̂(xt, x(t+1)).