The paper introduces the use of Transformer-based masked autoencoders as foundation models for vibration-based structural health monitoring (SHM). The authors demonstrate that these models can learn generalizable representations from multiple large datasets through self-supervised pre-training, and then outperform state-of-the-art methods on diverse tasks, including anomaly detection (AD) and traffic load estimation (TLE).
The authors consider three different SHM datasets, including a newly collected one, and build a Transformer-based masked autoencoder inspired by the work of [17]. By pre-training on all three datasets without using labels (self-supervised learning) and then fine-tuning on each specific task, the authors achieve better results than training three separate models from scratch.
For the AD task, the fine-tuned models outperform state-of-the-art algorithms, achieving a near-perfect 99.9% accuracy with a monitoring time span of just 15 windows, compared to the state-of-the-art 95.03% accuracy obtained only after considering 120 windows.
For the TLE tasks, the authors' models also obtain state-of-the-art performance on multiple evaluation metrics (R2 score, MAE%, and MSE%). On the first benchmark, they achieve an R2 score of 0.97 and 0.85 for light and heavy vehicle traffic, respectively, while the best previous approach stops at 0.91 and 0.84. On the second benchmark, they achieve an R2 score of 0.54 versus the 0.10 of the best existing method.
The authors also carry out an extensive search on the optimal model size and experiment with Knowledge Distillation (KD) to train smaller models to imitate larger ones, ultimately targeting deployment on resource-constrained nodes for real-time SHM at the edge. Results show that distilled models often outperform standardly fine-tuned and equally sized counterparts on downstream tasks.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問