insight - Regularized self-play for language model alignment
No data
No data