Distributional Dispreference Optimization (D2O) achieves alignment using solely human-annotated negative samples, reducing harmfulness while maintaining helpfulness.
Distributional Dispreference Optimization (D2O) enables alignment using solely human-annotated negative samples, reducing harmfulness while maintaining helpfulness.