The author evaluates the performance of state-of-the-art MLLMs on the NLVR challenge, highlighting their poor performance in spatial and compositional reasoning tasks.