Keskeiset käsitteet
A novel framework called Quality-Guided Contrastive Rationale Distillation (QCRD) that enhances the reasoning capabilities of smaller language models by effectively distilling both positive and negative knowledge from large language models through contrastive learning.
Tiivistelmä
The paper introduces a general framework called Quality-Guided Contrastive Rationale Distillation (QCRD) for distilling knowledge from large language models (LLMs) into smaller, more manageable language models. The key aspects of the QCRD approach are:
- Generating a diverse set of positive rationales from the LLM using temperature sampling and self-consistency to denoise the rationales.
- Generating negative rationales by sampling from previous iterations of the smaller language model, embracing the idea that one can learn from one's own weaknesses.
- Developing a contrastive loss function to distill both positive and negative rationales into the smaller language model, where a discriminator is used to assess the quality of the rationales and assign appropriate weights to optimize the training process.
The authors conduct extensive experiments on four popular datasets across different reasoning tasks, demonstrating that the QCRD approach consistently outperforms existing distillation techniques in transferring high-quality reasoning knowledge to smaller language models.
Tilastot
The LLM (GPT-3.5-turbo) generated 3.87 positive rationales on average for each input, while more than 50% of the training samples had only positive rationales.
The authors sampled the LLM's output 5 times with a temperature of 0.7 and sampled 5-iteration-before models with a temperature of 1.5 to generate negative rationales.
Lainaukset
"We first develop a general CoT distillation approach (i.e., QCRD) from a contrastive learning perspective, aiming to guide the student model to learn both positive and negative knowledge from rationales."
"We explore a contrastive distillation loss to facilitate effective distillation of the generated positive and negative rationales, where the qualities of the rationales judged by a discriminator are considered to optimize the training process across the whole datasets."
"Experimental results across multiple datasets show that QCRD demonstrably outperforms existing benchmarks, evidencing its efficiency in transferring contrastive reasoning knowledge to the smaller language model."