LLEMMA: An Open Language Model for Mathematics at ICLR 2024
Concepts de base
LLEMMA is a domain-specific language model for mathematics, outperforming known models on mathematical benchmarks and enabling tool use and formal theorem proving without further finetuning.
Résumé
Introduction
LLEMMA is a large language model specialized for mathematics.
Pretraining on Proof-Pile-2 improved mathematical capabilities.
Domain-specific models offer superior performance in specific domains.
Approach
Continued pretraining on Proof-Pile-2 yields LLEMMA with enhanced mathematical abilities.
Models are trained using bfloat16 mixed precision across GPUs.
Evaluation
LLEMMA excels in few-shot evaluations on mathematical benchmarks.
Capable of solving problems using computational tools like Python.
Formal Mathematics
LLEMMA demonstrates few-shot tactic prediction and proof autoformalization in Lean and Isabelle.
Impact of Data Mixture
Training data mixture impacts model performance, with an optimal ratio of arXiv:Web:Code.
Dataset Overlap and Memorization
Hits between test examples and training documents show minimal impact on model accuracy.
Related Work
Discusses advancements in large-scale language modeling, domain adaptation, and language models for mathematics.
Conclusion
LLEMMA provides a platform for understanding language model generalization in mathematics.