VisionLLaMA: A Unified Vision Transformer for Image Tasks
The author introduces VisionLLaMA, a vision transformer tailored for image tasks, bridging the gap between language and vision models. Extensive evaluations show its effectiveness in various downstream tasks, outperforming previous state-of-the-art vision transformers.