Large language models can achieve superhuman factuality ratings through automated evaluation methods like SAFE, providing cost-effective and reliable assessments.