insight - Math and Reasoning - # Self-Improvement in Small Language Models

Distilling Self-Improvement Ability into Smaller Language Models through Interactive Demonstrations

Core Concepts

TRIPOST, a training algorithm that endows smaller language models with the ability to self-improve by learning from interactive demonstrations with expert language models.

Abstract

The content discusses a training algorithm called TRIPOST that aims to distill the self-improvement ability of large language models (LLMs) into smaller language models. The key insights are: Prior work has shown that smaller models (e.g., LLaMA-7B) struggle to perform self-improvement on math and reasoning tasks, unlike their larger counterparts (e.g., Codex-175B). This is due to the capability mismatch between the small and large models. TRIPOST addresses this issue by using the smaller model to interact with LLMs, collecting feedback and improvement demonstrations, and then replaying this experience to train the smaller model. The TRIPOST algorithm consists of three stages: 1) Interactive Trajectory Editing, where the smaller model generates initial attempts and LLMs provide feedback and improvements; 2) Data Post-processing, where the collected data is filtered and re-balanced; and 3) Model Training, where the smaller model is trained using weighted supervised learning on the post-processed data. Experiments on four math and reasoning datasets from the BIG-Bench Hard collection show that TRIPOST-trained models can use their learned self-improvement ability to achieve better in-domain and out-of-domain performance compared to models trained using just ground-truth rationales or LLM demonstrations. The authors also provide analyses on the factors that influence the smaller model's self-improvement ability, such as the proportion of self-improvement data used during training and the number of TRIPOST iterations.

Stats

The Multistep Arithmetic task has a nesting depth (d) of 2-3 and a number of operands (l) of 3-4 for the seen subtasks, and d of 3 and l of 5-6 for the unseen subtasks. The Word Sorting task has a number of words to sort (l) ranging from 2 to 7 for the seen subtasks, and 8 to 16 for the unseen subtasks. The Date Understanding task has 1-2 steps to solve for the seen subtasks, and 3 or more steps for the unseen subtasks. The Logical Deduction task has 3 or 5 options for the seen subtasks, and 7 options for the unseen subtasks.

Quotes

"TRIPOST-trained models can use its learned self-improvement ability to improve their task performance." "Learning to always generate a useful feedback and a corresponding improvement can be much harder than learning to directly generate a correct answer."

Key Insights Distilled From

Teaching Language Models to Self-Improve through Interactive Demonstrations

by Xiao Yu,Baol... at arxiv.org 04-02-2024

https://arxiv.org/pdf/2310.13522.pdf

Teaching Language Models to Self-Improve through Interactive Demonstrations

Deeper Inquiries

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

To adapt TRIPOST for smaller models with fewer than 1 billion parameters, several modifications can be made: Simplified Feedback Generation: Since smaller models may not have the capacity to interact with LLMs for feedback, a simplified feedback generation module can be designed. This module could be rule-based or based on a smaller pre-trained language model that provides feedback on the model's outputs. Reduced Iterations: Smaller models may struggle with multiple iterations of self-improvement. Limiting the number of iterations or optimizing the trajectory editing process to be more efficient can help smaller models learn effectively. Task-Specific Training: Tailoring the training process to focus on specific tasks that smaller models excel at can enhance their self-improvement ability. By identifying the strengths of smaller models, TRIPOST can be customized to leverage these strengths effectively. Efficient Data Processing: Given the limited capacity of smaller models, optimizing the data post-processing stage to filter out irrelevant or noisy data can help in training more effectively. This can involve fine-tuning the filtering criteria to suit the capabilities of smaller models. Transfer Learning: Leveraging transfer learning techniques to pre-train smaller models on related tasks or datasets before applying TRIPOST can help in enhancing their self-improvement ability. This can provide a head start for the models to learn from their mistakes more efficiently.

Distilling Self-Improvement Ability into Smaller Language Models through Interactive Demonstrations

Teaching Language Models to Self-Improve through Interactive Demonstrations

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

How can TRIPOST be extended to work with even smaller models, such as those with less than 1 billion parameters?

Get PDF Summary in Seconds