Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
The author proposes a self-supervised text ranking approach using Proximal Policy Optimization to fine-tune language models, reducing the need for human annotators.