核心概念
The authors propose TFB, an automated benchmark for comprehensive and fair evaluation of time series forecasting methods across diverse datasets and techniques.
摘要
The authors present TFB, a comprehensive benchmark for evaluating time series forecasting (TSF) methods. TFB addresses key limitations in existing benchmarks:
-
Insufficient coverage of data domains: TFB includes 25 multivariate and 8,068 univariate time series datasets spanning 10 diverse domains, enabling a more thorough assessment of method performance.
-
Stereotype bias against traditional methods: TFB covers a wide range of methods, including statistical learning, machine learning, and deep learning approaches, to eliminate biases against certain method types.
-
Lack of consistent and flexible pipelines: TFB provides a unified, flexible, and scalable pipeline that ensures fair comparisons by handling dataset preprocessing, method integration, evaluation strategies, and reporting in a standardized manner.
The authors use TFB to evaluate 21 univariate and 14 multivariate TSF methods. Key findings include:
- Statistical methods like VAR and LinearRegression can outperform recent deep learning methods on certain datasets.
- Linear-based methods perform well on datasets with increasing trends or significant shifts.
- Transformer-based methods excel on datasets with strong seasonality, nonlinear patterns, and internal similarities.
- Methods considering cross-channel dependencies can significantly improve multivariate forecasting performance.
Overall, TFB enables more comprehensive and reliable evaluations, promoting progress in time series forecasting research.
統計資料
Time series can exhibit diverse characteristics like seasonality, trend, stationarity, shifting, and transition.
TFB covers 25 multivariate datasets spanning 10 domains, and 8,068 univariate datasets.
Existing benchmarks have limited domain coverage, with most focusing on traffic and electricity data.
引述
"Time series from different domains may exhibit much more complex patterns that either combine the above characteristics or are entirely different."
"No existing MTSF benchmark has evaluated statistical methods."
"Discarding those last-batch testing samples is inappropriate unless all methods use the same strategy."