toplogo
سجل دخولك

Analyzing Differential Item Functioning in a Mathematics Exam Using Rasch Models


المفاهيم الأساسية
Examining the results of a mathematics exam using Rasch item response theory models to assess measurement invariance and detect differential item functioning across student subgroups.
الملخص
The paper presents a hands-on tutorial for analyzing the results of a multiple-choice mathematics exam using item response theory (IRT) models, particularly the Rasch model. The main focus is on assessing measurement invariance and detecting differential item functioning (DIF) across different student subgroups. The analysis starts with fitting a Rasch model to the exam data to estimate the item difficulties and student abilities on a common scale. This provides an initial overview of the exam's performance. To assess measurement invariance, the paper then applies several methods: Classical two-sample comparisons of the item parameters between the two exam groups, with appropriate anchoring to ensure comparability. Rasch trees, which recursively partition the data based on generalized score tests for parameter instabilities along various covariates (e.g., number of online tests solved, number of exam items solved). Rasch finite mixture models, which can identify subgroups with different item parameter profiles without relying on observed covariates. The results reveal that there are indeed violations of measurement invariance, with certain items functioning differently for different subgroups of students. This has important implications for the fairness and interpretation of the exam results. The analysis is carried out entirely in R, leveraging various packages from the psycho* family (psychotools, psychotree, psychomix) that provide a unified framework for estimating, visualizing, testing, and partitioning a range of psychometric models.
الإحصائيات
The exam consisted of 13 single-choice items with 5 answer alternatives, covering topics in analysis, linear algebra, and financial mathematics. The data include binary responses (correct/incorrect) for 729 students, as well as covariates such as group (1 vs. 2), number of online tests solved, and number of exam items solved.
اقتباسات
"Due to the large number of students in the course, there are frequent online tests carried out in the university's learning management system OpenOlat (frentix GmbH 2024) as part of the tutorial groups, along with two written exams." "However, in Austria, to the best of our knowledge, it is still not common to apply standardized and/or automated psychometric assessments to exam results."

الرؤى الأساسية المستخلصة من

by Achim Zeilei... في arxiv.org 10-01-2024

https://arxiv.org/pdf/2409.19522.pdf
Examining Exams Using Rasch Models and Assessment of Measurement Invariance

استفسارات أعمق

How could the insights from the DIF analysis be used to improve the design and administration of the mathematics exam in the future?

The insights gained from the Differential Item Functioning (DIF) analysis can significantly enhance the design and administration of future mathematics exams. Firstly, the identification of items that exhibit DIF, such as those found to be more difficult for certain groups (e.g., Group 2 in the study), allows educators to revise or replace these items with more equitable alternatives. This ensures that all students, regardless of their subgroup characteristics, have a fair opportunity to demonstrate their mathematical abilities. Moreover, the analysis highlights the importance of item selection and the potential impact of prior exposure to specific item types. For instance, if certain items are perceived as easier due to prior familiarity, educators can strive to create a more balanced item pool that minimizes the influence of prior learning experiences. This could involve diversifying the types of questions and ensuring that they cover a broader range of topics and difficulty levels. Additionally, the findings can inform the administration of exams by encouraging the use of randomized item pools or adaptive testing methods. By tailoring the exam content to the individual student's ability level, educators can provide a more personalized assessment experience that accurately reflects each student's understanding of the material. Finally, ongoing monitoring and analysis of exam results using Rasch models and DIF assessments can establish a feedback loop for continuous improvement. By regularly evaluating the fairness and effectiveness of exam items, educational institutions can adapt their assessment strategies to better meet the needs of diverse student populations.

What other types of covariates, beyond those considered here, could potentially contribute to DIF and should be investigated?

Beyond the covariates considered in the study, such as gender, prior experience, and prior mathematics knowledge, several other factors could contribute to Differential Item Functioning (DIF) and warrant investigation. Language Proficiency: For students whose first language is not the language of instruction, language proficiency could significantly affect their understanding of exam questions, leading to DIF. Analyzing the impact of language skills on item performance could help identify items that may disadvantage non-native speakers. Socioeconomic Status: Students from different socioeconomic backgrounds may have varying access to educational resources, tutoring, and support systems. Investigating the influence of socioeconomic status on exam performance could reveal items that are more challenging for students with fewer resources. Learning Styles and Preferences: Individual learning styles (e.g., visual, auditory, kinesthetic) may affect how students approach and solve mathematical problems. Understanding how these styles interact with specific item types could provide insights into potential DIF. Test Anxiety: The psychological state of students during exams, including test anxiety, can impact performance. Analyzing the relationship between test anxiety and item responses could help identify items that disproportionately affect anxious students. Cultural Background: Cultural differences in problem-solving approaches and familiarity with certain mathematical concepts may lead to DIF. Investigating how cultural background influences item performance could enhance the fairness of assessments. Motivation and Engagement: Students' motivation levels and engagement with the course material can vary widely. Exploring how these factors correlate with item performance could help identify items that may not resonate equally with all students. By considering these additional covariates, educators can gain a more comprehensive understanding of the factors contributing to DIF, leading to more equitable assessments.

What are the broader implications of this study for the use of Rasch models and DIF analysis in educational assessment more generally?

The broader implications of this study for the use of Rasch models and Differential Item Functioning (DIF) analysis in educational assessment are significant and multifaceted. Enhanced Fairness in Assessments: The application of Rasch models allows for a more nuanced understanding of item performance across diverse student populations. By identifying and addressing DIF, educators can create assessments that are fairer and more representative of students' true abilities, thereby promoting equity in educational outcomes. Data-Driven Decision Making: The integration of DIF analysis into the assessment process encourages a data-driven approach to educational decision-making. Educators and administrators can utilize empirical evidence to inform item selection, exam design, and instructional strategies, leading to improved educational practices. Continuous Improvement of Assessment Tools: The study underscores the importance of ongoing evaluation and refinement of assessment tools. By regularly applying Rasch models and DIF analysis, educational institutions can adapt their assessments to changing student demographics and learning environments, ensuring that they remain relevant and effective. Broader Applicability Across Disciplines: While this study focuses on mathematics exams, the principles of Rasch modeling and DIF analysis can be applied across various subjects and educational contexts. This versatility enhances the potential for improving assessments in diverse fields, from the sciences to the humanities. Professional Development for Educators: The findings highlight the need for professional development opportunities for educators in psychometrics and assessment design. By equipping educators with the knowledge and skills to apply Rasch models and conduct DIF analysis, institutions can foster a culture of assessment literacy that benefits both teachers and students. Policy Implications: The insights gained from DIF analysis can inform educational policy at institutional and governmental levels. Policymakers can use this information to advocate for equitable assessment practices and allocate resources to support diverse learners effectively. In summary, the study illustrates the transformative potential of Rasch models and DIF analysis in educational assessment, paving the way for more equitable, effective, and data-informed evaluation practices.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star