Core Concepts
Statement-Tuning enables encoder models like RoBERTa to generalize to zero-shot and few-shot unseen tasks by training them to discriminate the truthfulness of natural language statements.
Abstract
The paper proposes a novel technique called Statement-Tuning to enable zero-shot and few-shot task generalization for encoder models like BERT and RoBERTa. The key idea is to verbalize various NLP tasks into natural language statements and train the encoder model to discriminate the truthfulness of these statements.
The authors first convert datasets from 16 diverse NLP tasks into statement format, where each possible label is represented by a natural language statement. They then fine-tune a RoBERTa model to perform binary classification on the truthfulness of these statements.
Through this Statement-Tuning process, the model learns a general semantic understanding of the statements, which allows it to generalize to unseen tasks by creating statements for the target labels and choosing the most likely one. The authors show that this Statement-Tuned RoBERTa model can match or even outperform much larger decoder-only or encoder-decoder language models on zero-shot and few-shot settings, while using significantly fewer parameters.
The authors also conduct extensive ablation studies to understand the impact of factors like statement sample size, statement template diversity, and task diversity during training. They find that increasing the number of training statements and the diversity of tasks and templates generally improves the model's zero-shot and few-shot generalization capabilities.
Overall, the paper demonstrates the potential of using smaller encoder models for effective zero-shot and few-shot task generalization by reformulating tasks as natural language statements and training the model to discriminate their truthfulness.
Stats
"Conceptually cream skimming has two basic dimensions - product and geography" entails "Product and geography are what make cream skimming work".
"Conceptually cream skimming has two basic dimensions - product and geography" is neutral with regards to "Product and geography are what make cream skimming work".
"Conceptually cream skimming has two basic dimensions - product and geography" contradicts "Product and geography are what make cream skimming work".
Quotes
"To the best of our knowledge we are the first to enable natural, zero-shot task generalization in encoder models by verbalizing the input into statements and fine-tuning the model to perform binary classification on the truth value of a statement."
"We expose that certain emergent abilities (Wei et al., 2022b) like zero-shot generalization on unseen tasks previously thought to be exclusive to decoder-based LLMs can also be observed in much smaller encoder models when we do multitask Statement-Tuning."
"We explore a large number of design choices to study how Statement-Tuning benefits from the number of statement examples and statement template and task diversity in multitask Statement-Tuning."