核心概念
CLIP-AD proposes a novel framework leveraging CLIP for zero-shot anomaly detection, achieving superior performance without training.
要約
この論文では、ゼロショット異常検出(AD)に焦点を当て、大規模なビジョン言語モデルCLIPのゼロショット能力を活用するための新しいフレームワークであるCLIP-ADが提案されています。テキストプロンプトの設計や異常セグメンテーションにおける問題点を解決するために、Staged Dual-Pathモデル(SDP)とSDP+が導入されました。これらの手法は、トレーニングなしで優れたパフォーマンスを達成し、実験結果はその有効性を示しています。
統計
Abundant experiments demonstrate the effectiveness of our approach, e.g., on MVTec-AD, SDP outperforms the SOTA WinCLIP by +4.2↑/+10.7↑ in segmentation metrics F1-max/PRO, while SDP+ achieves +8.3↑/+20.5↑ improvements.
For text prompts design, previous works focus on designing accurate text prompts, but more descriptions are not always better.
To address these issues, we introduce a Staged Dual-Path model (SDP) that leverages features from various levels and applies architecture and feature surgery.
Lastly, delving deeply into the two phenomena, we point out that the image and text features are not aligned in the joint embedding space.
Thus, we introduce a fine-tuning strategy by adding linear layers and construct an extended model SDP+, further enhancing the performance.
引用
"Extensive experiments show that our whole framework, CLIP-AD, surpasses the recent comparative methods."
"Our method uses general and coarse prompts without requiring any post-processing."
"The results are clearly much worse compared to using only a single linear layer."