Vision Language Models Struggle with Perspective-Taking Despite Proficiency in Intentionality Understanding
Vision Language Models (VLMs) exhibit high performance on intentionality understanding tasks but struggle with perspective-taking, challenging the common belief that perspective-taking is necessary for intentionality understanding.