Core Concepts
The author proposes a SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection, leveraging the robust generalization capabilities of foundation models while aligning with mobile-friendly requirements.
Abstract
The content discusses the development of a SAM-guided Two-stream Lightweight Model for anomaly detection. It addresses challenges in anomaly detection and localization, emphasizing model efficiency and mobile-friendliness. The proposed model shows competitive performance on benchmark datasets like MVTec AD, VisA, and DAGM. Key components include a large teacher network, plain student stream, mask decoder, and feature aggregation module. Ablation studies highlight the importance of these components in achieving superior results.
The paper also delves into related works in deep learning methods for anomaly detection and localization, vision foundation models, and data augmentation strategies. It presents experimental details, evaluation metrics, implementation specifics, quantitative results on various datasets, qualitative assessments through visualizations, and ablation studies to analyze the impact of different design choices on model performance.
Stats
STLM achieves 98.26% on pixel-level AUC and 94.92% on PRO.
Inference time for STLM is 20ms.
Parameters used by STLM: 16M.