MoCLE proposes a novel architecture to address task conflicts in vision-language instruction tuning, achieving specialization and generalization simultaneously.