toplogo
Sign In

Clustering MOOC Programming Solutions to Diversify Their Presentation to Students


Core Concepts
Developing tools to cluster and select diverse programming solutions from MOOC submissions to present to students, in order to improve their learning experience.
Abstract
The paper presents a novel problem of efficiently processing and presenting diverse programming solutions to students in Massive Open Online Courses (MOOCs). The authors developed two tools to address this problem: Adaptation of the existing plagiarism detection tool JPlag: JPlag uses a greedy string tiling algorithm to calculate the distance between student submissions and cluster them. However, JPlag has limitations in handling short Python submissions, processing only 5.3% of the tasks in the provided dataset. Development of a new tool called Rhubarb: Rhubarb first standardizes the submissions by applying 12 code transformations to bring algorithmically similar solutions to a common form. It then calculates the structure-aware edit distance between the standardized solutions using the GumTree tool and applies hierarchical agglomerative clustering. Finally, Rhubarb selects one representative example from each of the largest clusters, considering the code quality using the Hyperstyle tool. The authors compared the default platform approach, JPlag, and Rhubarb on a set of 59 tasks. Eight experts rated the selected solutions based on diversity, code quality, and usefulness. The default platform approach received an average score of 3.12 out of 5, JPlag - 3.77, and Rhubarb - 3.50. Since JPlag could only fully process 5.3% of the tasks, the authors implemented a system that uses JPlag on the tasks it can handle and Rhubarb on the remaining 94.7%. The key contributions of this work are: Rhubarb, a clustering tool for creating diverse solutions in MOOCs, which handles all the studied platform's tasks. A library of code transformations for Python that can be useful in other applications. An evaluation of Rhubarb with eight experts on 59 real tasks from a large MOOC platform, showing that it outperforms the default platform approach.
Stats
JPlag could fully process only 46 out of 867 studied tasks (5.3%). JPlag partially processed 434 more tasks (50.1%), skipping some solutions. JPlag did not process 387 tasks at all (44.6%). Rhubarb successfully processed 100% of the 867 tasks.
Quotes
"To solve this novel problem, we adapted the existing plagiarism detection tool JPlag to Python submissions on Hyperskill, a popular MOOC platform. However, due to the tool's inner algorithm, it fully processed only 46 out of 867 studied tasks." "Rhubarb was able to handle all 867 tasks successfully."

Deeper Inquiries

How can the quality and diversity of the selected solutions be further improved, beyond the current approaches?

To further enhance the quality and diversity of the selected solutions in MOOCs, several strategies can be implemented. Firstly, incorporating more advanced code analysis techniques, such as natural language processing (NLP) and machine learning algorithms, can help identify subtle differences in solutions that may not be captured by traditional methods. By analyzing the semantics and logic of the code, the system can better differentiate between similar solutions and provide a more diverse set of examples to students. Additionally, introducing a feedback loop mechanism where students can rate the usefulness and quality of the presented solutions can help improve the selection process over time. By leveraging student feedback, the system can learn to prioritize solutions that are not only diverse but also highly beneficial for learning purposes. Furthermore, integrating collaborative filtering techniques, similar to those used in recommendation systems, can personalize the selection of solutions based on individual student preferences and learning styles. By tailoring the presented solutions to each student's needs, the system can ensure a more engaging and effective learning experience.

What are the potential drawbacks or unintended consequences of automatically selecting and presenting diverse solutions to students in MOOCs?

While automatically selecting and presenting diverse solutions in MOOCs can offer several benefits, there are potential drawbacks and unintended consequences to consider. One concern is the risk of overwhelming students with too much information, especially if the diversity of solutions leads to confusion rather than clarity. Students may struggle to navigate through a large number of solutions and find it challenging to identify the most relevant ones for their learning needs. Moreover, there is a possibility of introducing bias in the selection process, where certain types of solutions are prioritized over others, leading to a skewed representation of coding approaches. This bias can impact the learning outcomes of students and limit their exposure to a wide range of problem-solving strategies. Another drawback is the potential for plagiarism and cheating, as students may be tempted to replicate solutions presented to them rather than engaging in the problem-solving process independently. This can undermine the educational integrity of the MOOC platform and hinder the development of students' critical thinking and problem-solving skills.

How can the insights from this work on diversifying programming solution presentation be applied to other educational domains beyond programming?

The insights gained from diversifying programming solution presentation in MOOCs can be applied to other educational domains to enhance learning outcomes and student engagement. One way is to adapt the clustering and selection algorithms developed for programming solutions to other subjects that involve problem-solving, such as mathematics or physics. By presenting students with a variety of problem-solving approaches, educators can promote critical thinking and creativity across different disciplines. Furthermore, the concept of standardizing solutions and measuring distance between them can be extended to essay writing or language learning tasks. By analyzing the structure and content of written responses, educators can provide students with diverse examples that showcase different writing styles and techniques, fostering a deeper understanding of the subject matter. Additionally, the idea of incorporating student feedback and personalization into the selection process can be applied to adaptive learning systems in various educational domains. By tailoring learning materials and resources to individual student preferences and learning styles, educators can create a more personalized and effective learning experience for students across different subjects and disciplines.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star