المفاهيم الأساسية
The author explores the challenges of evaluating out-of-distribution generalization and proposes three paradigms for evaluation. They emphasize the importance of understanding model performance under distribution shifts.
الملخص
The content delves into the evaluation of out-of-distribution generalization, highlighting challenges and proposing paradigms for assessment. It discusses datasets, benchmarks, performance prediction methods, and intrinsic property characterization in detail.
Machine learning models face challenges with distribution shifts.
Various branches focus on OOD generalization algorithms.
Evaluation protocols play a fundamental role in assessing OOD generalization.
Datasets like visual, text, and tabular are used for testing.
Benchmarks like DomainBed and WILDS facilitate algorithm comparison.
Performance prediction methods include model output properties and distribution discrepancy analysis.
Intrinsic properties like distributional robustness, stability, invariance, and flatness are crucial for understanding model behavior.
الإحصائيات
"This paper serves as the first effort to conduct a comprehensive review of OOD evaluation."
"We categorize existing research into three paradigms: OOD performance testing, OOD performance prediction, and OOD intrinsic property characterization."
"In real applications, we can hardly guarantee that the test data encountered by deployed models will conform to the same distribution as training data."