Centrala begrepp
A computationally efficient PID control-based self-healing framework can improve the robustness of pre-trained large language models against a wide range of perturbations.
Sammanfattning
The paper introduces a novel PID control-based self-healing framework to improve the robustness of pre-trained large language models (LLMs) against various perturbations. Key highlights:
- The authors interpret an LLM as a discrete dynamical system and formulate the robustness issue as a trajectory optimization problem.
- They design PID (Proportional-Integral-Derivative) controllers at hidden layers of the LLM to continuously correct undesired model behavior caused by input perturbations.
- An analytical solution is derived under specific assumptions, which provides a computationally efficient implementation that is as fast as using a single control scheme.
- Theoretical error analysis is provided, demonstrating the effectiveness of PID control in improving LLM robustness.
- Extensive experiments show that the proposed PID control-based self-healing framework significantly enhances the robustness of both standard and robustly trained LLMs against a wide range of adversarial attacks and datasets.
Statistik
The proposed PID control-based self-healing framework improves the average robustness performance by nearly 10% on standard models and 5% on robustly trained models.
On the ANLI dataset, the PID control method leads to a 1.0783% mean improvement in performance, with a 95% confidence interval of 0.0564% to 2.1004%.
Citat
"The proposed PID control-based self-healing is a low-cost framework that improves the robustness of pre-trained large language models, whether standard or robustly trained, against a wide range of perturbations."
"We demonstrate that employing all P, I, and D controllers can be as computationally efficient as single control schemes, achieved through special controller design."