Large Language Models Still Struggle to Comprehend Language on Par with Humans Despite Scaling
Even the largest language models tested, such as ChatGPT-4, do not demonstrate linguistic abilities on par with humans in grammaticality judgment tasks, despite significant scaling in model size.