The paper investigates backdoor attacks on multilingual neural machine translation (MNMT) systems. The key findings are:
Backdoor attacks can be effectively transferred across different language pairs within MNMT systems. Injecting poisoned data into a low-resource language pair (e.g., Malay-Javanese) can lead to malicious translations in high-resource language pairs (e.g., Indonesian-English) without directly manipulating the high-resource language data.
The authors propose three approaches to craft poisoned data: Token Injection (Tokeninj), Token Replacement (Tokenrep), and Sentence Injection (Sentinj). Tokenrep and Tokeninj demonstrate high attack success rates while maintaining strong stealthiness, posing challenges for defense.
Experiments show that inserting less than 0.01% poisoned data into a low-resource language pair can achieve an average 20% attack success rate in attacking high-resource language pairs. This is particularly concerning given the larger attack surface of low-resource languages.
Current defense approaches based on language models and data filtering struggle to effectively detect the poisoned data, especially for low-resource languages. The authors emphasize the need for more research to enhance the security of low-resource languages in MNMT systems.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Jun Wang,Qio... a las arxiv.org 04-04-2024
https://arxiv.org/pdf/2404.02393.pdfConsultas más profundas