The authors introduce a new benchmark called Continuously Changing Corruptions (CCC) to thoroughly evaluate the long-term performance of test-time adaptation (TTA) methods. They find that all current state-of-the-art TTA methods, including those specifically designed for continual adaptation, eventually collapse and perform worse than a non-adapting, pretrained model when evaluated on CCC.
The authors first show that previous benchmarks, such as Concatenated ImageNet-C (CIN-C), are too short and uncontrolled to reliably assess long-term continual adaptation behavior. In contrast, CCC features smooth transitions between different image corruptions, allowing for the evaluation of adaptation dynamics over long timescales.
Using CCC, the authors demonstrate that methods like Tent, CoTTA, ETA, and others collapse over time, despite some of them being designed to prevent such collapse. In contrast, the authors propose a simple baseline called "RDumb" that periodically resets the model to its pretrained state, and show that it outperforms all previous methods on both CCC and existing benchmarks.
The authors further validate their findings by testing the methods on a variety of backbone architectures, including Vision Transformers, and provide theoretical and empirical analyses to understand the causes of the observed collapse.
Ke Bahasa Lain
dari konten sumber
arxiv.org
Pertanyaan yang Lebih Dalam