This study explores the use of quantized large language models (LLMs), specifically the LLaMA-2 model, for the tasks of automatic grading and feedback generation. The researchers conducted experiments on both proprietary and open-source datasets to evaluate the performance of the quantized LLaMA-2 models.
For the automatic grading task, the quantized LLaMA-2 13B model with QLoRA fine-tuning outperformed other baseline models, achieving an RMSE of 0.036 and MAE of 0.028 on the proprietary dataset, and an RMSE of 0.257 on the open-source SAF dataset. The results demonstrate the effectiveness of the quantized LLaMA-2 models in accurately predicting grade scores.
For the feedback generation task, the researchers found that supplying the predicted grade scores as additional input to the fine-tuned LLaMA-2 models led to significant improvements in the quality of the generated feedback, as measured by BLEU and ROUGE scores. The quantized LLaMA-2 13B model with grade score input achieved the best performance, with BLEU, ROUGE-1, and ROUGE-2 scores of 0.707, 0.775, and 0.737, respectively, on the proprietary dataset.
The findings from this study provide valuable insights into the potential of using quantization techniques to fine-tune LLMs for various downstream tasks, such as automatic grading and feedback generation, while maintaining high accuracy and quality at reduced computational costs and latency.
In eine andere Sprache
aus dem Quellinhalt
arxiv.org
Wichtige Erkenntnisse aus
by Gloria Ashiy... um arxiv.org 05-02-2024
https://arxiv.org/pdf/2405.00602.pdfTiefere Fragen