Core Concepts
Layerwise EArly STopping (LEAST) is proposed to halt individual layer adaptation during the test time adaptation process to prevent overfitting to new domain samples.
Abstract
The paper introduces Layerwise EArly STopping (LEAST), a novel approach designed specifically for Test Time Adaptation (TTA). To prevent overfitting during TTA, LEAST employs a novel criterion for masking the adaptation of individual layers for a given sample at test time. This is done when the updates of the current sample do not align with those of previously seen samples. The alignment is measured using a novel cosine distance-based criterion, which only requires storing a single copy of the initial model's parameters. Consequently, there is no requirement for access to any supervised labels or validation set, a common necessity in classical early stopping methods.
The key highlights of the paper are:
- LEAST consistently outperforms standard baselines like ERM (no adaptation) and all layers TTA adaptation across multiple datasets, model architectures, and TTA losses.
- LEAST outperforms existing layer selection approaches like AutoRGN and AutoSNR, demonstrating the effectiveness of the layerwise early stopping approach.
- The proposed cosine distance-based criterion for layerwise early stopping without a validation set closely approximates the performance of layerwise early stopping using a validation set.
- LEAST effectively balances adapting to the new domain, preventing source forgetting, and managing the computational cost-performance trade-off.
Stats
LEAST outperforms ERM (no adaptation) baseline by 2% overall.
LEAST outperforms all layers TTA adaptation by more than 5% overall.
LEAST consistently demonstrates equivalent or superior performance compared to existing layer selection baselines across all datasets and TTA losses.
Quotes
"Layerwise EArly STopping (LEAST) is proposed to halt individual layer adaptation during the test time adaptation process to prevent overfitting to new domain samples."
"We propose a novel cosine distance-based criterion that can identify the early stopping point from gradient updates without needing a validation set."