Outcome-supervised Value Models for Efficient Multi-Step Mathematical Reasoning
Outcome supervision can be leveraged to train a value model that prioritizes steps leading to accurate final answers, enabling efficient guided decoding for multi-step mathematical reasoning.