The author compares the effectiveness of different visual encoders, highlighting the advantages of shallow layer features and the potential of DINOv2 as a visual branch in MLLMs.