This study compares the performance of transformer language models (Pythia) and two contemporary recurrent language model architectures (RWKV and Mamba) in predicting various metrics of online human language comprehension, including neural measures like the N400 and behavioral measures like reading time.
The key findings are:
On the N400 datasets, the recurrent models (Mamba and RWKV) generally outperform the transformer models, especially when comparing models of the same size. This suggests that transformers are not uniquely well-suited for modeling the N400.
For the reading time datasets, the results are more mixed, with some showing positive scaling (larger/better models perform better) and others showing inverse scaling (larger/better models perform worse). This aligns with previous work on the complex relationship between language model performance and reading time metrics.
When comparing models by perplexity rather than just size, an interesting pattern emerges - for datasets showing positive scaling, Mamba (the best perplexity model) performs relatively worse compared to the other architectures, while RWKV (the worst perplexity model) performs relatively better. The opposite is true for datasets showing inverse scaling. This suggests that a language model's ability to predict the next word impacts its ability to model human language comprehension beyond just model size and architecture.
The results highlight that there is no single universal pattern accounting for the relationship between language model probability and all metrics of online human language comprehension. The relationship is complex and depends on the specific dataset, metric, and model architecture.
Overall, the findings demonstrate that contemporary recurrent language models can match or exceed transformer performance in modeling human language comprehension, opening up new directions for research on the cognitive plausibility of different language model architectures.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by James A. Mic... lúc arxiv.org 05-01-2024
https://arxiv.org/pdf/2404.19178.pdfYêu cầu sâu hơn