Core Concepts
The author describes two systems that use large language models to automate the correction of exercises in translating between natural language and formal logic, as well as exercises in writing simple arguments in natural language.
Abstract
The author discusses two systems being developed that use large language models (LLMs) for automated tasks in teaching logic and reasoning:
Autoformalization:
Autoformalization is the automated translation from natural language to formal logic.
The author experimented with using text-davinci-003 and GPT-4-Turbo for this task.
With text-davinci-003, the performance was satisfactory for simple examples, but the model had issues with strange or absurd content, using all available notation, and high logical complexity.
GPT-4-Turbo showed better performance, correctly formalizing around 96.5% of the sample sentences in propositional logic and 92% in first-order logic.
Local LLMs focused on code-writing, like WizardCoder-34B, also showed reasonable results for propositional logic, though not as good as GPT-4-Turbo.
Natural Language Argumentation Exercises:
These exercises involve practicing logical argumentation in non-mathematical contexts.
The natural language input can be automatically translated into propositional logic using the same LLM-based approach.
The formal representation can then be passed to the Diproche system, which provides feedback on the logical correctness of the argument.
The author notes that the systems are currently in an experimental stage, with challenges around pricing and stability of the underlying LLMs that need to be overcome before actual deployment in teaching.
Stats
"If the sun shines, Hans goes for a walk. When Hans goes for a walk, he takes his dog with him. When Hans takes his dog for a walk, the dog barks at the cat on the neighbour's roof. When the dog barks at the cat on the roof, the cat runs away. However, the cat still sits on the roof."
"Barking dogs don't bark"
"Barking cats don't bite"