This study examines the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large language models (LLMs) for this task. The authors benchmark a wide range of language models, from 77 million to 40 billion parameters, using different architectures (encoder-decoder and decoder-only) and scoring functions.
The key findings are:
Model size does not always correlate with better performance. While some datasets show a positive correlation between model size and classification accuracy/F1 score, many others do not exhibit a significant relationship.
The architectural choice (encoder-decoder vs. decoder-only) can have a significant impact on performance, depending on the dataset.
Instruction fine-tuning can improve performance, but the effect is dataset-dependent and varies across architectures.
The choice of scoring function does not seem to significantly affect the performance of either encoder-decoder or decoder-only models.
The authors conclude that small language models can effectively classify texts in a zero-shot setting, often matching or surpassing the performance of larger models. This suggests that resource-efficient small models may offer viable solutions for specific data classification challenges, challenging the prevailing notion that bigger is always better.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Pierre Lepag... lúc arxiv.org 04-18-2024
https://arxiv.org/pdf/2404.11122.pdfYêu cầu sâu hơn