This paper explores the use of large language models (LLMs) and bag-of-words (BoW) models to automatically analyze classroom discourse and estimate scores for the "Instructional Support" domain of the Classroom Assessment Scoring System (CLASS), a widely used classroom observation protocol.
The key highlights and insights are:
The authors devise a zero-shot prompting approach to use an LLM (Llama2) to analyze individual utterances in a classroom transcript for the presence of behavioral indicators associated with the CLASS Instructional Support dimensions, rather than directly predicting the global CLASS scores.
Experiments on two datasets of toddler and pre-kindergarten classrooms show that the automated methods can achieve Pearson correlations up to 0.48 with human-annotated CLASS Instructional Support scores, approaching the level of human inter-rater reliability.
LLMs generally outperform classic BoW models, but the best performance often comes from combining features extracted from both LLM and BoW approaches.
While automated methods are still not as accurate as human judgments at the individual utterance level, the authors illustrate how the model outputs can be visualized to provide teachers with explainable feedback on which specific utterances were most positively or negatively correlated with the CLASS Instructional Support dimensions.
The goal is to explore how artificial intelligence can provide teachers with more specific, frequent, and accurate feedback about their teaching in an unobtrusive and privacy-preserving way.
To Another Language
from source content
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Jacob Whiteh... lúc arxiv.org 04-18-2024
https://arxiv.org/pdf/2310.01132.pdfYêu cầu sâu hơn