Optimizing Vision Transformers for Efficient Inference: A Comprehensive Survey on Model Quantization and Hardware Acceleration
Vision Transformers (ViTs) have emerged as a promising alternative to convolutional neural networks (CNNs) in computer vision, but their large model sizes and high computational demands hinder deployment, especially on resource-constrained devices. Model quantization and hardware acceleration are crucial to address these challenges and enable efficient ViT inference.