Sign In

Automated Classroom Discourse Analysis Using Large Language Models and Bag-of-Words to Provide Actionable Feedback on Instructional Support

Core Concepts
Large language models and bag-of-words approaches can be used to automatically analyze classroom discourse and provide teachers with specific, actionable feedback on the quality of their instructional support.
This paper explores the use of large language models (LLMs) and bag-of-words (BoW) models to automatically analyze classroom discourse and estimate scores for the "Instructional Support" domain of the Classroom Assessment Scoring System (CLASS), a widely used classroom observation protocol. The key highlights and insights are: The authors devise a zero-shot prompting approach to use an LLM (Llama2) to analyze individual utterances in a classroom transcript for the presence of behavioral indicators associated with the CLASS Instructional Support dimensions, rather than directly predicting the global CLASS scores. Experiments on two datasets of toddler and pre-kindergarten classrooms show that the automated methods can achieve Pearson correlations up to 0.48 with human-annotated CLASS Instructional Support scores, approaching the level of human inter-rater reliability. LLMs generally outperform classic BoW models, but the best performance often comes from combining features extracted from both LLM and BoW approaches. While automated methods are still not as accurate as human judgments at the individual utterance level, the authors illustrate how the model outputs can be visualized to provide teachers with explainable feedback on which specific utterances were most positively or negatively correlated with the CLASS Instructional Support dimensions. The goal is to explore how artificial intelligence can provide teachers with more specific, frequent, and accurate feedback about their teaching in an unobtrusive and privacy-preserving way.
"The average transcript length is 1204.34 words per session for the UVA Toddler dataset and 1585.51 words per session for the NCRECE PreK dataset." "Human inter-rater reliabilities (Pearson R) on the CLASS scores range from 0.24 to 0.55 across the different dimensions and datasets."
"With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, we explore how Large Language Models (LLMs) can be used to estimate 'Instructional Support' domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol." "Experiments on two CLASS-coded datasets of toddler and pre-kindergarten classrooms indicate that (1) automatic CLASS Instructional Support estimation accuracy using the proposed method (Pearson R up to 0.48) approaches human inter-rater reliability (up to R = 0.55)."

Deeper Inquiries

How could the automated feedback be integrated into a human-in-the-loop system where teachers can review and provide feedback on the model's outputs to further improve its performance

To integrate automated feedback into a human-in-the-loop system for teachers, a structured approach can be followed. Firstly, the automated system should provide feedback based on the analysis of classroom interactions and instructional support. This feedback can be presented to teachers in a user-friendly interface where they can review the model's outputs. Teachers should have the ability to see the specific utterances or interactions that the model flagged as significant for instructional support. Teachers can then provide feedback on the model's outputs by confirming or correcting the model's assessments. This feedback loop is crucial for improving the model's performance over time. Teachers' input can be used to retrain the model, incorporating their corrections and insights to enhance the accuracy of future predictions. Additionally, the system can provide suggestions or prompts for teachers to consider based on the model's analysis. These prompts can guide teachers on areas of improvement or specific strategies to enhance instructional support in the classroom. By incorporating teachers' feedback and insights, the model can continuously learn and adapt to provide more accurate and personalized feedback.

What other classroom observation frameworks or teaching quality measures could be targeted for automated analysis using similar approaches

Automated analysis using similar approaches can be extended to target other classroom observation frameworks or teaching quality measures. Some potential frameworks that could be considered for automated analysis include: Mathematical Quality of Instruction (MQI): Similar to the CLASS framework, the MQI focuses on assessing the quality of mathematics instruction in classrooms. Automated analysis could be used to evaluate teacher-student interactions, problem-solving strategies, and mathematical discourse to provide feedback on instructional practices. Protocol for Language Arts Teaching Observations: This framework focuses on assessing language arts instruction in classrooms. Automated analysis could analyze teacher-student interactions, vocabulary development, reading comprehension strategies, and writing activities to provide feedback on language arts instruction. Mathematical Quality Instruction: This framework evaluates the quality of mathematics instruction, including problem-solving approaches, mathematical discourse, and student engagement. Automated analysis could assess the effectiveness of instructional strategies, mathematical reasoning, and student collaboration to provide feedback to teachers. By applying similar machine learning and natural language processing techniques to these frameworks, automated analysis can offer valuable insights and feedback to teachers to enhance their instructional practices.

How might the automated analysis be extended to also provide feedback on other important aspects of teaching, such as classroom management or emotional support for students

Automated analysis can be extended to provide feedback on other important aspects of teaching, such as classroom management and emotional support for students. To incorporate these aspects into the automated feedback system, additional indicators and behavioral cues related to classroom management and emotional support can be identified and analyzed. For classroom management, the system can analyze teacher-student interactions related to behavior management, classroom routines, and student engagement. It can flag instances of effective classroom management strategies, positive reinforcement techniques, and proactive behavior interventions. Feedback can be provided to teachers on their management practices and suggestions for improving classroom dynamics. Regarding emotional support for students, the system can analyze teacher responses to student emotions, empathy, and supportive interactions. It can identify instances of positive reinforcement, active listening, and emotional validation in teacher-student interactions. Feedback can be given to teachers on fostering a supportive and inclusive classroom environment to meet students' emotional needs. By incorporating these additional aspects into the automated analysis, teachers can receive comprehensive feedback on various dimensions of their teaching practices, including instructional support, classroom management, and emotional support, leading to holistic professional development and improved student outcomes.