A novel Meta Variationally Intrinsic Motivated (MetaVIM) reinforcement learning method is proposed to learn decentralized policies for traffic signal control that consider neighbor information in a latent way, enabling effective and generalizable control in large-scale road networks.
MTLIGHT enhances the agent observation with a latent state learned from numerous traffic indicators, and employs multiple auxiliary and supervisory tasks to learn the latent state, which improves the convergence speed and asymptotic performance of traffic signal control.