This study explores the use of quantized large language models (LLMs), specifically the LLaMA-2 model, for the tasks of automatic grading and feedback generation. The researchers conducted experiments on both proprietary and open-source datasets to evaluate the performance of the quantized LLaMA-2 models.
For the automatic grading task, the quantized LLaMA-2 13B model with QLoRA fine-tuning outperformed other baseline models, achieving an RMSE of 0.036 and MAE of 0.028 on the proprietary dataset, and an RMSE of 0.257 on the open-source SAF dataset. The results demonstrate the effectiveness of the quantized LLaMA-2 models in accurately predicting grade scores.
For the feedback generation task, the researchers found that supplying the predicted grade scores as additional input to the fine-tuned LLaMA-2 models led to significant improvements in the quality of the generated feedback, as measured by BLEU and ROUGE scores. The quantized LLaMA-2 13B model with grade score input achieved the best performance, with BLEU, ROUGE-1, and ROUGE-2 scores of 0.707, 0.775, and 0.737, respectively, on the proprietary dataset.
The findings from this study provide valuable insights into the potential of using quantization techniques to fine-tune LLMs for various downstream tasks, such as automatic grading and feedback generation, while maintaining high accuracy and quality at reduced computational costs and latency.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問