Improving the Reasoning Capabilities of Large Language Models through Self-Exploration and Fine-Grained Rewards
Large language models can self-improve their reasoning capabilities by extracting fine-grained learning signals from their own generated rationales.