Dong, K., Mahankali, A., & Ma, T. (2024). Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically. arXiv preprint arXiv:2411.01829.
This research aims to improve the ability of LLMs to generate formal proofs in a more natural and challenging setup where directly relevant lemmas are not provided, requiring the model to exhibit stronger planning and decomposition capabilities.
The authors propose Proof Decomposer (ProD), an RL-based training algorithm that encourages LLMs to decompose theorems into lemmas, prove them individually, and then utilize these proven lemmas to prove the original theorem. The model is trained using a reward mechanism inspired by how mathematicians work, rewarding the model for proposing and proving correct and novel lemmas even if the original theorem remains unproven.
The study demonstrates the effectiveness of using RL with hierarchical lemma decomposition to enhance the formal theorem proving capabilities of LLMs. The proposed method encourages the model to learn a more natural and generalizable approach to theorem proving, moving beyond reliance on pre-existing lemmas.
This research contributes to the field of automated theorem proving by presenting a novel approach that leverages the power of LLMs while addressing the limitations of previous methods. The ability to generate proofs in a more human-like manner holds significant potential for advancing the field.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Kefan Dong, ... at arxiv.org 11-05-2024
https://arxiv.org/pdf/2411.01829.pdfDeeper Inquiries