VisionLLaMA is a unified vision transformer architecture tailored for processing 2D images, exhibiting substantial gains over previous state-of-the-art models.