Comprehensive Empirical Study of Routers in Vision Mixture of Experts Models
Mixture-of-Experts (MoE) models offer a promising way to scale up model capacity without significantly increasing computational cost. This paper presents a comprehensive empirical study of different routers, which are key components of MoE models that decide which subset of parameters (experts) process which feature embeddings (tokens), in the context of computer vision tasks.