Adapting Large Visual-Language Models to Edge Devices for Diverse Modalities
The author introduces EdgeVL, a framework that adapts large Vision-Language models for edge devices, addressing challenges in diverse visual modalities and computational constraints. The approach integrates dual-modality knowledge distillation and quantization-aware contrastive learning to enhance model efficiency.