Core Concepts
MCCATCH is a novel algorithm that efficiently detects microclusters in both dimensional and nondimensional datasets, outperforming other methods.
Abstract
I. Abstract:
MCCATCH introduces a new algorithm for detecting microclusters in various datasets.
It outperforms 11 other methods, especially in cases of non-singleton microclusters or nondimensional data.
II. Introduction:
Challenges of outlier detection and the importance of identifying microclusters are discussed.
MCCATCH aims to work with any metric dataset, rank outliers by anomalousness, be principled, scalable, and 'hands-off'.
III. Problem & Related Work:
The main problem statement is defined regarding finding disjoint microclusters and their corresponding anomaly scores.
Comparison with related work shows MCCATCH meets all specifications while competitors miss one or more features.
IV. Proposed Axioms:
Axioms are proposed to rank microclusters based on their anomalousness.
The score of each microcluster reflects its compression when described in terms of the nearest inlier.
V. Proposed Method:
MCCATCH leverages the 'Oracle' plot to detect outliers and group them into microclusters.
Anomaly scores are computed based on the cost of describing each microcluster relative to the nearest inlier.
VI. Time and Space Complexity:
The time complexity of MCCATCH is estimated to be O(n * n^(1-u)), where u is the intrinsic dimensionality of the dataset P.
Stats
この論文は、31の実データセットと合成データセットを使用して、MCCATCHが他の11の手法を上回ることを示しています。
MCCATCHは222Kのデータ要素に対して約3分で30要素のマイクロクラスターを検出しました。