Predicting the Out-of-Distribution Performance of Foundation Models Using Agreement-on-the-Line
Estimating the out-of-distribution (OOD) performance of foundation models is critical for their safe deployment, but acquiring OOD labels is often costly. The authors demonstrate that by carefully constructing diverse ensembles of finetuned foundation models, the agreement-on-the-line (AGL) phenomenon can be leveraged to reliably predict OOD performance without labels.