CIFAR-10-Warehouse: Broad Testbed for Model Generalization Analysis

Core Concepts
CIFAR-10-Warehouse introduces a diverse testbed for evaluating model generalization in various out-of-distribution environments.
1. Introduction Analyzing model performance in unseen environments is crucial. Existing testbeds have limitations in domain coverage. 2. Data Collection CIFAR-10-W consists of 180 datasets with real-world and diffusion-generated images. Dataset statistics show a range of images per category. 3. Task I: Model Accuracy Prediction Evaluation of accuracy prediction methods on CIFAR-10-W and synthetic datasets. Performance varies across different test sets, with more challenges on CIFAR-10-W. 4. Task II: Domain Generalization Benchmarking different DG methods on CIFAR-10-W for single-source and multi-source settings. Classification accuracy ranges widely, indicating the diversity of test domains.
CIFAR-10-C (Hendrycks & Dietterich, 2019)には50のドメインがあります。 CIFAR-10-C (Hendrycks & Dietterich, 2019)には75のドメインがあります。 ImageNet-C (Hendrycks & Dietterich, 2019)には75のドメインがあります。
"Existing testbeds typically either have a small number of domains or are synthesized by image corruptions." "We aim to enhance the evaluation and deepen the understanding of two generalization tasks: domain generalization and model accuracy prediction."

