This study examines the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large language models (LLMs) for this task. The authors benchmark a wide range of language models, from 77 million to 40 billion parameters, using different architectures (encoder-decoder and decoder-only) and scoring functions.
The key findings are:
Model size does not always correlate with better performance. While some datasets show a positive correlation between model size and classification accuracy/F1 score, many others do not exhibit a significant relationship.
The architectural choice (encoder-decoder vs. decoder-only) can have a significant impact on performance, depending on the dataset.
Instruction fine-tuning can improve performance, but the effect is dataset-dependent and varies across architectures.
The choice of scoring function does not seem to significantly affect the performance of either encoder-decoder or decoder-only models.
The authors conclude that small language models can effectively classify texts in a zero-shot setting, often matching or surpassing the performance of larger models. This suggests that resource-efficient small models may offer viable solutions for specific data classification challenges, challenging the prevailing notion that bigger is always better.
翻譯成其他語言
從原文內容
arxiv.org
深入探究