מושגי ליבה
Data poisoning attacks can significantly degrade the performance of in-context learning in large language models, highlighting the urgent need for enhanced security and robustness.
תקציר
The paper introduces a novel framework called ICLPoison to assess the vulnerability of in-context learning (ICL) in large language models (LLMs) to data poisoning attacks.
Key highlights:
- ICL has emerged as an important component of LLMs, allowing them to adapt to new tasks using a few examples without retraining or fine-tuning. However, the success of ICL depends critically on the quality of the demonstration data.
- The paper investigates whether ICL is vulnerable to data poisoning attacks, where adversaries manipulate the demonstration data to degrade model performance.
- The authors propose the ICLPoison framework, which strategically distorts the hidden states of LLMs during the ICL process through discrete text perturbations, such as synonym replacement, character replacement, and adversarial suffix.
- Comprehensive experiments across various LLMs and tasks demonstrate the effectiveness of the ICLPoison framework, revealing that ICL performance can be significantly compromised, with up to a 10% decrease in accuracy for advanced models like GPT-4.
- The findings highlight the urgent need for enhanced defense mechanisms to safeguard the integrity and reliability of LLMs in applications relying on in-context learning.
סטטיסטיקה
"In recent years, In-Context Learning (ICL) (Brown et al., 2020; Min et al., 2022) has emerged as an important component of large language models (LLMs)."
"Studies have shown that the ICL performance is sensitive to certain characteristics of demonstrations, e.g., the selection of examples (Wang et al., 2023) and the order of examples in demonstration (Min et al., 2022)."
"Our comprehensive tests, including trials on the sophisticated GPT-4 model, demonstrate that ICL's performance is significantly compromised under our framework."
ציטוטים
"Data poisoning in ICL faces both unique challenges specific to ICL and common obstacles in traditional data poisoning."
"To tackle the above challenges, we introduce a novel and versatile attacking framework, ICLPoison, to exploit the unique learning mechanism of ICL."
"Comprehensive experiments across various LLMs and tasks demonstrate the effectiveness of our methods, highlighting the vulnerability of ICL."