Robinson, K., Kudugunta, S., Stella, R., Dev, S., & Bastings, J. (2024). MiTTenS: A Dataset for Evaluating Gender Mistranslation. arXiv preprint arXiv:2401.06935v3.
This research paper aims to address the issue of gender mistranslation in machine translation systems by introducing a new dataset, MiTTenS, specifically designed to evaluate and measure this problem across various languages and translation models.
The researchers developed MiTTenS, a dataset comprising 13 evaluation sets covering 26 languages. These sets include handcrafted, synthetically generated, and naturally sourced passages targeting known gender mistranslation patterns. The dataset was used to evaluate the performance of several dedicated translation systems (e.g., NLLB) and foundation models (e.g., GPT-4) by analyzing their accuracy in translating gendered entities.
The evaluation revealed that all tested translation systems, including large language models, exhibit varying degrees of gender mistranslation, even in high-resource languages. The study identified specific areas of weakness, such as translating passages where gender information is encoded in nouns or introduced later in the source text. Notably, a consistent pattern emerged where systems performed worse when translating passages requiring the pronoun "she" compared to "he," suggesting potential biases in training data.
The authors conclude that MiTTenS provides a valuable resource for measuring and mitigating gender mistranslation in machine translation. The dataset's diverse linguistic coverage and focus on specific error patterns allow for targeted improvements in translation models. The findings highlight the need for continued research and development of fairer and more inclusive language technologies.
This research significantly contributes to the field of Natural Language Processing by providing a standardized and comprehensive dataset for evaluating gender bias in machine translation. The identification of systemic biases in existing systems emphasizes the importance of addressing ethical considerations in developing language technologies.
The study acknowledges limitations in covering non-binary gender expressions due to the complexities of their representation across languages and cultures. Future research should explore methods to evaluate and mitigate mistranslation of non-binary gender identities. Additionally, expanding the dataset to include direct translations between languages beyond English and developing more sophisticated automated evaluation methods are crucial next steps.
翻譯成其他語言
從原文內容
arxiv.org
深入探究