Large Language Models, particularly GPT-4, excel in evaluating grammatical error corrections, surpassing traditional metrics.