The paper proposes a novel method called Plug and Play with Prompts (PPP) to achieve controlled text generation using large language models. The key idea is to train prompt embeddings that can steer the generation of text towards a desired style or attribute, while maintaining the fluency of the generated text.
The method consists of two main components:
The prompt embeddings are trained by backpropagating the loss from the discriminator model to update the prompt embeddings, while also using a fluency loss to ensure the generated text remains coherent. This allows the prompts to learn to generate text with the desired style, without significantly degrading the fluency.
The authors evaluate PPP on four datasets covering sentiment, formality, and toxicity control. They show that PPP significantly outperforms existing plug-and-play methods like PPLM and GeDi in terms of style control, while maintaining similar fluency. Importantly, PPP can achieve this level of control using very small datasets (as low as a few hundred samples) for training the prompts.
The authors also demonstrate PPP's ability to generalize to larger, out-of-domain datasets, and its potential to mitigate the generation of harmful and toxic text by language models.
翻译成其他语言
从原文生成
arxiv.org
更深入的查询