Transformer models, including GPT-4 and fine-tuned BERT variants, exhibit limited generalisability to out-of-distribution perturbations in mathematical reasoning tasks despite strong in-distribution performance.
Mathematical language processing requires sophisticated methods to extract information, reason over mathematical elements, and produce real-world problem solutions. Recent research has advanced key components in this direction, including transformer-based language models, graph-based representations, and multi-modal encoding approaches.