Główne pojęcia
This survey provides a novel taxonomy for online anomaly detection in multivariate time series, distinguishing between online training and online inference. It presents an extensive overview and analysis of state-of-the-art model-based online semi- and unsupervised anomaly detection approaches, as well as the most popular benchmark data sets and evaluation metrics used in the literature.
Streszczenie
This survey introduces a novel taxonomy for online anomaly detection in multivariate time series, making a distinction between online training and online inference. It presents an extensive overview of the state-of-the-art model-based online semi- and unsupervised anomaly detection approaches, categorizing them into different model families and other properties.
The survey also provides a detailed analysis of the most popular benchmark data sets used in the literature, highlighting their fundamental flaws, such as triviality, unrealistic anomaly density, uncertain labels, and run-to-failure bias. Additionally, it presents an extensive overview and analysis of the proposed evaluation metrics, discussing their strengths, weaknesses, and the need for parameter-free and interpretable metrics.
The biggest research challenge revolves around benchmarking, as currently there is no reliable way to compare different approaches against one another. This problem is two-fold: on the one hand, public data sets suffer from at least one fundamental flaw, while on the other hand, there is a lack of intuitive and representative evaluation metrics in the field. Moreover, the way most publications choose a detection threshold disregards real-world conditions, which hinders the application in the real world. To allow for tangible advances in the field, these issues must be addressed in future work.
Statystyki
"Time-series data can expose subtle but important trends and correlations, as well as give the data user key insights on how to optimise engineering systems and processes, which can potentially provide a company with a competitive advantage in the market."
"With the rise of industry 4.0, anomaly detection has therefore gained relevance over the past decade, with the bar being set ever higher as data becomes more and more high dimensional."
"Deep learning is also a very active research area, owing to increasing computing power and the availability of large amounts of data. It can be applied to anomaly detection, especially in high dimensional data, which is where traditional approaches have started to reach their limits."
Cytaty
"An observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism."
"As a result of the fourth industrial revolution, also known as industry 4.0, immense amounts of data are collected from sensors mounted at different checkpoints in many processes in research and development, manufacturing and testing."