Core Concepts
Crafting high-quality math multiple choice questions (MCQs) is a labor-intensive process that requires educators to formulate precise stems and plausible distractors. This paper introduces a prototype tool that facilitates collaboration between large language models (LLMs) and educators to streamline the math MCQ generation process.
Abstract
The paper introduces a prototype tool called the Human Enhanced Distractor Generation Engine (HEDGE) that leverages the expertise of educators to generate math MCQs through a two-step process:
Generation of the question stem, key, and explanation:
The LLM (GPT-4) generates the initial stem, key, and explanation, which the educators then evaluate and edit to ensure mathematical accuracy and relevance to the intended knowledge component (KC).
Generation of distractors, misconceptions, and feedback:
The LLM generates a set of possible errors/misconceptions and the corresponding distractors and feedback, which the educators then evaluate and edit to ensure they correspond to valid distractors for the generated question stem.
The pilot study involving four math educators reveals that while 70% of the generated stems, keys, and explanations were considered valid, only 37% of the generated misconceptions, distractors, and feedback were deemed valid. This observation underscores the necessity of involving human experts in the process of generating math MCQs to leverage their knowledge of common student errors and misconceptions.
The paper also discusses potential improvements to the tool, such as using multiple in-context examples, providing a bank of distractors, and allowing educators to customize the content to make the questions more engaging and relevant for students.
Stats
70% of the generated stems, keys, and explanations were considered valid by the participants.
Only 37% of the generated misconceptions, distractors, and feedback were deemed valid by the participants.
Quotes
"The emergence of large language models (LLMs) has raised hopes for making MCQ creation more scalable by automating the process."
"Nevertheless, a human-AI collaboration has the potential to enhance the efficiency and effectiveness of MCQ generation."