Major AI developers should provide legal and technical safe harbors to protect public interest safety research from account suspensions or legal reprisal.
LLMs struggle with lateral thinking in LatEval, highlighting the need for improved AI capabilities.
AutoDE provides a dynamic evaluation framework that closely mirrors human assessments, revealing deficiencies overlooked by static evaluations.