核心概念
Research on gender bias in machine translation heavily favors a few high-resource European languages, neglecting many African and even some European languages, highlighting a need for more diverse and inclusive research in the field.
统计
German and Spanish are the most frequently studied languages in the context of machine translation bias, appearing in 14 papers each.
French appears in 7 papers, while Italian, Hebrew, Arabic, and Chinese each appear in 5 papers.
Several languages, including Mandarin, Turkish, Ukrainian, Bengali, Punjabi, Gujarati, Tamil, Icelandic, Marathi, Latvian, Romanian, Yoruba, Indonesian, and Mongolian, appear only once in the reviewed literature.
Languages like Amharic, Tigrinya, Kabyle, Somali, and Hausa are entirely absent from the reviewed research.
引用
"It was been shown that societal stereotypes are included in NLP models (e.g., Bolukbasi et al. (2016) Caliskan et al. (2017) Wilson and Caliskan (2024))."
"A case study has shown that in Google Translate, even if not expecting a 50:50 gender distribution, the machine translation engine yielded male defaults much more frequently than it would be expected from the corresponding demographic data (Prates et al., 2020)."
"In reviewing the existing research on bias in machine translation (MT) found in our study, it becomes evident that the majority of studies focuses on a selection of few high-resource languages, often from Western Europe."