Core Concepts
Introducing REALIGN, a simple and effective method to automatically improve the quality of existing instruction datasets by reformatting responses to better align with pre-established criteria and collated evidence, without introducing additional data or advanced training techniques.
Abstract
The paper explores elevating the quality of existing instruction data to better align large language models (LLMs) with human values. It introduces REALIGN, a simple and effective approach that reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence.
The REALIGN process involves three main steps:
Criteria Definition: Humans define their preferences (e.g., the preferred format of responses) in various scenarios in the form of natural language.
Retrieval Augmentation: Broadens the knowledge base for knowledge-intensive tasks by incorporating additional information, thereby improving the factuality and informativeness of responses.
Reformatting: Aims to re-align the responses with the pre-established criteria and the collated evidence, guaranteeing outputs that are both structured and substantiated.
The paper evaluates REALIGN on five types of existing instruction data, including general datasets (Open-Platypus, No Robots, Alpaca) and mathematical datasets (GSM8K, MATH). The results show that REALIGN significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs, without introducing any additional data or advanced training techniques. For example, REALIGN improves the mathematical reasoning ability of LLaMA-2-13B on the GSM8K test set from 46.77% to 56.63% in accuracy. Additionally, a mere 5% of REALIGN data yields a 67% boost in general alignment ability measured by the Alpaca dataset.
The paper highlights the need for further research into the science and mechanistic interpretability of LLMs and makes the associated code and data publicly accessible to support future studies.
Stats
The total number of three-digit numbers less than 500 is 400.
The number of three-digit numbers less than 500 that have no digits that are the same is 288.
The number of three-digit numbers less than 500 that have at least two digits that are the same is 112.
Quotes
"There are 112 positive three-digit integers less than 500 that have at least two digits that are the same."