Core Concepts
Preventing data leakage in knowledge tracing models is crucial for maintaining performance and accuracy.
Abstract
The content discusses the challenges faced by Knowledge Tracing (KT) models due to data leakage, particularly when dealing with a high number of knowledge concepts per question. It introduces a framework to address this issue and presents various model variations to prevent data leakage during training and evaluation. The paper also highlights the impact of data leakage on model performance across different datasets.
The structure of the content is as follows:
- Introduction to Knowledge Tracing and its importance in intelligent tutoring systems.
- Problems identified with existing KT models related to data leakage.
- Proposed solutions including a general masking framework and an open-source benchmark library called KTbench.
- Detailed explanations of Deep Knowledge Tracing (DKT) and Attentive Knowledge Tracing (AKT) models.
- Experiments conducted using different datasets to compare original models with proposed variations.
- Results and discussions on the performance of introduced model variations compared to baseline models.
- Conclusion emphasizing the significance of preventing data leakage in KT models.
Stats
"Many KT models expand the sequence of item-student interactions into KC-student interactions by replacing learning items with their constituting KCs."
"This problem can lead to a significant decrease in performance on datasets with a higher number of KCs per item."
"Most benchmarks use datasets with a small average number of KCs per question."
Quotes
"We introduce a general masking framework that mitigates the first problem and enhances the performance of such KT models while preserving the original model architecture without significant alterations."
"All these adjusted models do not suffer from data leakage during training, problem 2, or evaluation."