Core Concepts
The authors propose a policy optimization-based framework for authorship style transfer, combining supervised fine-tuning and policy optimization to achieve superior results in both low-resource and high-resource tasks.
Abstract
The content discusses the development of a novel approach for authorship style transfer using policy optimization. The proposed method outperforms existing models in both low-resource and high-resource scenarios, showcasing the effectiveness of the framework. Key points include the challenges of traditional approaches, the introduction of ASTRAPOP, methodology details, data generation strategies, supervised fine-tuning, policy optimization stages, evaluation results on individual and community authorship style transfer tasks, limitations of the approach, ethical considerations, acknowledgements, and related work.
Stats
Existing approaches rely on a large number of target style exemplars for model training.
The proposed ASTRAPOP model can be successfully transferred to an author's style with as few as five examples.
ASTRAPOP outperforms state-of-the-art baseline models in both low-resource individual authorship style transfer and medium-resource community authorship style transfer tasks.
DPO and CPO algorithms show better performance than PPO in improving overall performance metrics.
Quotes
"Authorship style transfer aims to rewrite a given text into a specified target while preserving the original meaning in the source."
"We propose Authorship Style TRAnsfer with Policy OPtimization (ASTRAPOP), a lightweight two-step PO training framework for authorship style transfer."
"Our reward function requires only one reward model instead of three reward models used in STEER."