The paper analyzes the performance of popular image recognition models like VGG and ResNet on two diverse datasets - the Dollar Street Dataset and ImageNet. It reveals a significant gap in the performance of these models on images from high-income and low-income households, as well as images from western and non-western geographies.
To address this issue, the paper explores several techniques:
Weighted Loss: Reweighting the loss function to penalize low-income images more during training, in order to improve classification of these images.
Sampling: Oversampling low-income images and undersampling high-income images to make the training data distribution more uniform across income levels.
Focal Loss: Using a focal loss function to down-weight the "easy" high-income examples and focus more on the "hard" low-income examples during training.
Adversarial Discriminative Domain Adaptation (ADDA): Leveraging domain adaptation techniques to bridge the gap between high-income and low-income image representations.
The experiments show that the focal loss approach with a gamma value of 5 performs the best on the Dollar Street dataset, while the results on ImageNet are not as promising. The ADDA experiments suggest that the domain shift between high and low-income images is too large for the model to effectively adapt. Overall, the paper highlights the need for building more geography-agnostic and fair image recognition models.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Akshat Jinda... às arxiv.org 04-03-2024
https://arxiv.org/pdf/2312.02957.pdfPerguntas Mais Profundas