Centrala begrepp
The core message of this paper is to analyze and mitigate the inherent geographical biases present in state-of-the-art image classification models, in order to make them more robust and fair across different geographical regions and income levels.
Sammanfattning
The paper analyzes the performance of popular image recognition models like VGG and ResNet on two diverse datasets - the Dollar Street Dataset and ImageNet. It reveals a significant gap in the performance of these models on images from high-income and low-income households, as well as images from western and non-western geographies.
To address this issue, the paper explores several techniques:
-
Weighted Loss: Reweighting the loss function to penalize low-income images more during training, in order to improve classification of these images.
-
Sampling: Oversampling low-income images and undersampling high-income images to make the training data distribution more uniform across income levels.
-
Focal Loss: Using a focal loss function to down-weight the "easy" high-income examples and focus more on the "hard" low-income examples during training.
-
Adversarial Discriminative Domain Adaptation (ADDA): Leveraging domain adaptation techniques to bridge the gap between high-income and low-income image representations.
The experiments show that the focal loss approach with a gamma value of 5 performs the best on the Dollar Street dataset, while the results on ImageNet are not as promising. The ADDA experiments suggest that the domain shift between high and low-income images is too large for the model to effectively adapt. Overall, the paper highlights the need for building more geography-agnostic and fair image recognition models.
Statistik
The paper uses the following key statistics and figures:
The Dollar Street Dataset contains ~30,000 images from 264 homes in 50 countries, belonging to 131 classes.
The ImageNet dataset used in the experiments contains 50,249 images from 596 classes, with location metadata obtained from the Flickr API.
The GDP per capita (nominal) values used to map the ImageNet images to income levels are: Oceania ($53,220), North America ($49,240), Europe ($29,410), South America ($8,560), Asia ($7,350), and Africa ($1,930).
Citat
"Recent advancements in GPUs and ASICs like TPU, resulting in increased computational power, have led to many object recognition systems achieving state of the art performance on publicly available datasets like ImageNet [8], COCO [15], and OpenImages [12]. However, these systems seem to be biased toward images obtained from well-developed western countries, partly because of the skewed distribution of the geographical source location of such images [7]."
"DeVries et al [7] revealed a major gap in the top-5 average accuracy of six object recognition systems on images from high and low income households and images from western and non-western geographies. Our goal is to reduce this bias introduced into the systems because of the inherent nature of the training data."