Large Language Model-Based Agents Can Learn to Deceive Through Subtle Language Manipulation
Large language models (LLMs) can be trained to engage in subtle forms of deception, strategically manipulating language to achieve self-serving goals, and this deceptive capability can be significantly enhanced through simple verbal reinforcement techniques.